Page 2 of 2

Re: Sabnzbd pauses/errors nzbs with strange characters in name

Posted: March 11th, 2018, 10:49 am
by safihre
No in this case it's special character that indicates there is Unicode, the charachter \x06

Re: Sabnzbd pauses/errors nzbs with strange characters in name

Posted: March 11th, 2018, 12:01 pm
by KuroNeko
Well, I'm no programmer so I don't know how that would work, what I did was copy that line and try to create a file with that name. All then happened was a warning that the \ wasn't allowed and it was suppressed by Windows Explorer, resulting in the name "u'[SNSbu] Fate kaleid liner Prismax06Illya 2wei! Specials - 1 (BD 1920x1080 h264 FLAC).mkv'"

So Windows does not seem to recognize this as a special code. But. like I said, I don't know how this would work from your view as a programmer. Sorry for the confusion.

Re: Sabnzbd pauses/errors nzbs with strange characters in name

Posted: March 11th, 2018, 2:12 pm
by sander
safihre wrote: March 11th, 2018, 10:29 am Sander, do you maybe have some idea?
My idea: a strange post, caused by the poster. Not spend much time on it. Especially because the "subject="-part in the NZB seems to be syntactically incorrect (EDIT: can also be caused by the indexer, not the poster): unescaping gives garbage:

In the .NZB:

Code: Select all

"[SNSbu] Fate kaleid liner Prisma☆Illya 2wei (BD 1920x1080 h264 FLAC) [1/4] - "[SNSbu] Fate kaleid liner Prisma☆Illya 2wei! Specials - 1 (BD 1920x1080 h264 FLAC).mkv" yEnc 83003729 (1/116)"
which after unescape() gives:

Code: Select all

[SNSbu] Fate kaleid liner Prisma☆Illya 2wei (BD 1920x1080 h264 FLAC) [1/4] - "[SNSbu] Fate kaleid liner Prisma☆Illya 2wei! Specials - 1 (BD 1920x1080 h264 FLAC).mkv" yEnc 83003729 (1/116)
That looks like garbage to me.
Googling links to https://en.wikipedia.org/wiki/Fate/kale ... isma_Illya ... with texts like "Fate/kaleid liner Prisma Illya (Fate/kaleid liner プリズマ☆イリヤ Fate/kaleid liner purizuma iriya)" ... so maybe the poster accidently put some Japanese letters into the filename/post with incorrect encoding?

The problem is not the NZB, but that we detect the filename using Par2.
If so, on Windows, in SABnzbd you could deselect/delete the par2 files, and try again. I doubt that works.

Re: Sabnzbd pauses/errors nzbs with strange characters in name

Posted: March 11th, 2018, 4:20 pm
by sander
OK, a post with very technical notes as background.

Summary: SAB is OK with Japanese characters in the NZB, both Unicode and HTML-escaped-style. And the indexers are not perfect with such posts, but their NZBs are acceptable for SABnzbd.
In other words ... this seems to confirm the NZB by the original poster is really in the wrong NZB.
Disclaimer: tested on Linux.

Long:

I created a 100MB file "HelloWorld-こんにちは世界.bin", added par2 files (no rar files), and posted it with the tool 'nyuu', and let nyuu create the .NZB. That NZB contains:

Code: Select all

<?xml version="1.0" encoding="utf-8"?>
...
	<file poster="blablamannetje &lt;[email protected]&gt;" date="1520798401" subject="[1/9] - &quot;HelloWorld-こんにちは世界.bin&quot; yEnc (1/147) 104857600">
Because of the "utf-8", the Japanese characters are allowed.
That NZB works OK with SAB. Good.

I changed that .NZB to contain HTML escape codes:

Code: Select all

<?xml version="1.0" encoding="iso-8859-1" ?>
...
	<file poster="blablamannetje &lt;[email protected]&gt;" date="1520798401" subject="[1/9] - &quot;HelloWorld-&#12371;&#12435;&#12395;&#12385;&#12399;&#19990;&#30028;.bin&quot; yEnc (1/147) 104857600">
and that worked OK for SAB too.

Conclusion: SAB is fine.

The NZB via binsearch https://www.binsearch.info/?q=HelloWorl ... 00&server=
Binsearch' (choose the ones with poster 'blablamannetje'): NZB says "iso-8859-1", but contains Japanese unicode characters. Not correct, but it works for SABnzbd.

The NZB from nzbindex https://nzbindex.com/search/?q=HelloWor ... m=1&more=1 ... works.
NZBindex' NZB says "iso-8859-1" and has HTML codes, which is correct, and works for SABnzbd.

Re: Sabnzbd pauses/errors nzbs with strange characters in name

Posted: March 12th, 2018, 2:05 am
by safihre
Yes indeed, SAB is fine when it comes from the file.
The problem is when the filename is deducted from the par2. By that I mean that we analyze the par2-header files and extract the filename from there.
So we have no clue about the encoding etc.. Kind of tricky!

@KuroNeko yes this is special python-syntax with the \, not related/workable on Windows Explorer :)

Re: Sabnzbd pauses/errors nzbs with strange characters in name

Posted: May 13th, 2018, 7:30 am
by Nyanderful
Sorry for bringing up an old topic, but I happened to came across this thread and thought I'd add to it.

For reference, the original NZB can be found here.

Unfortunately (like with a number of things around Usenet), the PAR2 specifications are a bit vague, and state that the FileDesc packet uses an "ASCII char array" for the filename. ParPar implemented this by writing out the filename as ASCII, which NodeJS seems to do by simply truncating characters to 1 byte, resulting in the star character (U+2606) becoming U+0006.
Thinking about this, this is likely undesirable, and "ASCII char array" actually probably refers to the system code page. Although this can vary from system to system, I assume most would be using UTF-8 or the client ignores the system code page and just assumes UTF-8, so I've changed ParPar to encode "ASCII filenames" as UTF-8. Even if the client isn't using UTF-8, this change should hopefully avoid these kinds compatibility issues.

Some suggestions for Sabnzbd though:
  • consider interpreting the UniFileN packet if available. This contains a "unicode" encoded (presumably UTF-16LE) filename, and hence avoids codepage variations across systems. (ParPar, by default, will generate one if there are characters above U+007F in the file name)
  • apply some sanitization to filenames like removing special characters. I see there's a platform_encode function which probably can be augmented for the purpose, where you could strip invalid characters for the platform (i.e. special/invalid characters in Windows filenames)