Sabnzbd pauses/errors nzbs with strange characters in name

Get help with all aspects of SABnzbd
Forum rules
Help us help you:
  • Are you using the latest stable version of SABnzbd? Downloads page.
  • Tell us what system you run SABnzbd on.
  • Adhere to the forum rules.
  • Do you experience problems during downloading?
    Check your connection in Status and Interface settings window.
    Use Test Server in Config > Servers.
    We will probably ask you to do a test using only basic settings.
  • Do you experience problems during repair or unpacking?
    Enable +Debug logging in the Status and Interface settings window and share the relevant parts of the log here using [ code ] sections.
User avatar
safihre
Administrator
Administrator
Posts: 2815
Joined: April 30th, 2015, 7:35 am
Location: Switzerland
Contact:

Re: Sabnzbd pauses/errors nzbs with strange characters in name

Post by safihre » March 11th, 2018, 10:49 am

No in this case it's special character that indicates there is Unicode, the charachter \x06

KuroNeko
Newbie
Newbie
Posts: 12
Joined: November 21st, 2012, 2:25 pm

Re: Sabnzbd pauses/errors nzbs with strange characters in name

Post by KuroNeko » March 11th, 2018, 12:01 pm

Well, I'm no programmer so I don't know how that would work, what I did was copy that line and try to create a file with that name. All then happened was a warning that the \ wasn't allowed and it was suppressed by Windows Explorer, resulting in the name "u'[SNSbu] Fate kaleid liner Prismax06Illya 2wei! Specials - 1 (BD 1920x1080 h264 FLAC).mkv'"

So Windows does not seem to recognize this as a special code. But. like I said, I don't know how this would work from your view as a programmer. Sorry for the confusion.

User avatar
sander
Release Testers
Release Testers
Posts: 6460
Joined: January 22nd, 2008, 2:22 pm

Re: Sabnzbd pauses/errors nzbs with strange characters in name

Post by sander » March 11th, 2018, 2:12 pm

safihre wrote:
March 11th, 2018, 10:29 am
Sander, do you maybe have some idea?
My idea: a strange post, caused by the poster. Not spend much time on it. Especially because the "subject="-part in the NZB seems to be syntactically incorrect (EDIT: can also be caused by the indexer, not the poster): unescaping gives garbage:

In the .NZB:

Code: Select all

"[SNSbu] Fate kaleid liner Prisma☆Illya 2wei (BD 1920x1080 h264 FLAC) [1/4] - "[SNSbu] Fate kaleid liner Prisma☆Illya 2wei! Specials - 1 (BD 1920x1080 h264 FLAC).mkv" yEnc 83003729 (1/116)"
which after unescape() gives:

Code: Select all

[SNSbu] Fate kaleid liner Prisma☆Illya 2wei (BD 1920x1080 h264 FLAC) [1/4] - "[SNSbu] Fate kaleid liner Prisma☆Illya 2wei! Specials - 1 (BD 1920x1080 h264 FLAC).mkv" yEnc 83003729 (1/116)
That looks like garbage to me.
Googling links to https://en.wikipedia.org/wiki/Fate/kale ... isma_Illya ... with texts like "Fate/kaleid liner Prisma Illya (Fate/kaleid liner プリズマ☆イリヤ Fate/kaleid liner purizuma iriya)" ... so maybe the poster accidently put some Japanese letters into the filename/post with incorrect encoding?

The problem is not the NZB, but that we detect the filename using Par2.
If so, on Windows, in SABnzbd you could deselect/delete the par2 files, and try again. I doubt that works.

User avatar
sander
Release Testers
Release Testers
Posts: 6460
Joined: January 22nd, 2008, 2:22 pm

Re: Sabnzbd pauses/errors nzbs with strange characters in name

Post by sander » March 11th, 2018, 4:20 pm

OK, a post with very technical notes as background.

Summary: SAB is OK with Japanese characters in the NZB, both Unicode and HTML-escaped-style. And the indexers are not perfect with such posts, but their NZBs are acceptable for SABnzbd.
In other words ... this seems to confirm the NZB by the original poster is really in the wrong NZB.
Disclaimer: tested on Linux.

Long:

I created a 100MB file "HelloWorld-こんにちは世界.bin", added par2 files (no rar files), and posted it with the tool 'nyuu', and let nyuu create the .NZB. That NZB contains:

Code: Select all

<?xml version="1.0" encoding="utf-8"?>
...
	<file poster="blablamannetje &lt;[email protected]&gt;" date="1520798401" subject="[1/9] - &quot;HelloWorld-こんにちは世界.bin&quot; yEnc (1/147) 104857600">
Because of the "utf-8", the Japanese characters are allowed.
That NZB works OK with SAB. Good.

I changed that .NZB to contain HTML escape codes:

Code: Select all

<?xml version="1.0" encoding="iso-8859-1" ?>
...
	<file poster="blablamannetje &lt;[email protected]&gt;" date="1520798401" subject="[1/9] - &quot;HelloWorld-&#12371;&#12435;&#12395;&#12385;&#12399;&#19990;&#30028;.bin&quot; yEnc (1/147) 104857600">
and that worked OK for SAB too.

Conclusion: SAB is fine.

The NZB via binsearch https://www.binsearch.info/?q=HelloWorl ... 00&server=
Binsearch' (choose the ones with poster 'blablamannetje'): NZB says "iso-8859-1", but contains Japanese unicode characters. Not correct, but it works for SABnzbd.

The NZB from nzbindex https://nzbindex.com/search/?q=HelloWor ... m=1&more=1 ... works.
NZBindex' NZB says "iso-8859-1" and has HTML codes, which is correct, and works for SABnzbd.

User avatar
safihre
Administrator
Administrator
Posts: 2815
Joined: April 30th, 2015, 7:35 am
Location: Switzerland
Contact:

Re: Sabnzbd pauses/errors nzbs with strange characters in name

Post by safihre » March 12th, 2018, 2:05 am

Yes indeed, SAB is fine when it comes from the file.
The problem is when the filename is deducted from the par2. By that I mean that we analyze the par2-header files and extract the filename from there.
So we have no clue about the encoding etc.. Kind of tricky!

@KuroNeko yes this is special python-syntax with the \, not related/workable on Windows Explorer :)

Nyanderful
Newbie
Newbie
Posts: 4
Joined: July 15th, 2015, 5:31 am

Re: Sabnzbd pauses/errors nzbs with strange characters in name

Post by Nyanderful » May 13th, 2018, 7:30 am

Sorry for bringing up an old topic, but I happened to came across this thread and thought I'd add to it.

For reference, the original NZB can be found here.

Unfortunately (like with a number of things around Usenet), the PAR2 specifications are a bit vague, and state that the FileDesc packet uses an "ASCII char array" for the filename. ParPar implemented this by writing out the filename as ASCII, which NodeJS seems to do by simply truncating characters to 1 byte, resulting in the star character (U+2606) becoming U+0006.
Thinking about this, this is likely undesirable, and "ASCII char array" actually probably refers to the system code page. Although this can vary from system to system, I assume most would be using UTF-8 or the client ignores the system code page and just assumes UTF-8, so I've changed ParPar to encode "ASCII filenames" as UTF-8. Even if the client isn't using UTF-8, this change should hopefully avoid these kinds compatibility issues.

Some suggestions for Sabnzbd though:
  • consider interpreting the UniFileN packet if available. This contains a "unicode" encoded (presumably UTF-16LE) filename, and hence avoids codepage variations across systems. (ParPar, by default, will generate one if there are characters above U+007F in the file name)
  • apply some sanitization to filenames like removing special characters. I see there's a platform_encode function which probably can be augmented for the purpose, where you could strip invalid characters for the platform (i.e. special/invalid characters in Windows filenames)

Post Reply