Special national characters in filenames are not handled

Report & discuss bugs found in SABnzbd
Forum rules
Help us help you:
  • Are you using the latest stable version of SABnzbd? Downloads page.
  • Tell us what system you run SABnzbd on.
  • Adhere to the forum rules.
  • Do you experience problems during downloading?
    Check your connection in Status and Interface settings window.
    Use Test Server in Config > Servers.
    We will probably ask you to do a test using only basic settings.
  • Do you experience problems during repair or unpacking?
    Enable +Debug logging in the Status and Interface settings window and share the relevant parts of the log here using [ code ] sections.
Post Reply
anton
Newbie
Newbie
Posts: 13
Joined: November 5th, 2009, 7:19 am

Special national characters in filenames are not handled

Post by anton »

sabdnzbd does not handle UTF8 characters in filenames, e.g.
I live in denmark, where we have letters like æ, ø and å, I just tried to download "Lærkevej" which causes problems

1) the file name displayed in the browser (plush) is L_rkevej.
2) the file name generated tin the file system is: Lrkevej
3) the par2 file refrences a file called "Lærkevej" (using one of the utf-8 encodings for æ) .. which cause par to fail
User avatar
switch
Moderator
Moderator
Posts: 1380
Joined: January 17th, 2008, 3:55 pm
Location: UK

Re: Special national characters in filenames are not handled

Post by switch »

Are you adding something via newzbin? Their API converts characters not allowed in HTTP headers into underscores, so it is probably broken before it gets to SABnzbd.

As for the par2, that might be another issue, however I don't have an answer for that myself.
User avatar
shypike
Administrator
Administrator
Posts: 19774
Joined: January 18th, 2008, 12:49 pm

Re: Special national characters in filenames are not handled

Post by shypike »

There are two par2 programs in the world.
One uses 8bit ASCII and the other UTF-8 and they are mutually incompatible.
More so when the creating and receiving platform differ.

Please email the NZB file to bugs at sabnzbd.org
anton
Newbie
Newbie
Posts: 13
Joined: November 5th, 2009, 7:19 am

Re: Special national characters in filenames are not handled

Post by anton »

Yes, the nzb is downloaded from newzbine, but the real issue is burried inside the base .par2 file.

I found this workaround on the sourceforge page for par2, as a suggested workaround: (link: http://sourceforge.net/projects/parchiv ... ic/1843329)

------ snip --------
"Unicode issue" as in: "par2 clients don't support the optional unicode parts of the par2 specs and stupidly write 8-bit characters to the ascii (7-bit) 'name of the file' field of the File Description packet"?

Fixing that file set is easy. for ex: par2 r La\ Bête.par2 La*.rar The resulting rar files might or might not have an "ê" in the filename, depending on the fact if you and the creator of the par2 set use the same character encoding...

If clients had adhered to the specs and hadn't stored non-ascii characters in what should have been an ascii-only field, the clients would probably have had no other choice but to implement the optional unicode stuff to store non-ascii filenames years ago... :(

----------- snip --------


I tried the workaround and it does ideed work, unfortunately this requires a change in the way libnzbd invokes par2, or could I do that with the optional par2 parameters?
User avatar
shypike
Administrator
Administrator
Posts: 19774
Joined: January 18th, 2008, 12:49 pm

Re: Special national characters in filenames are not handled

Post by shypike »

This will only be covered in the pending 0.5.0 release.
The binary distributions will shipped with both variants of par2
and the right par2 program will be picked, based on the encoding of the par2-files.
It works OK in the latest code we have now (after I removed some remaining glitches).

Unfortunately this feature will not work for Linux, because we
only distribute a source package and rely on the installed par2.
I have to look at the suggested work-around, but at first glace
I'm not convinced of it's universal usability.

Thanks for reporting this.
You might be interested in signing up for the Release Test program.
anton
Newbie
Newbie
Posts: 13
Joined: November 5th, 2009, 7:19 am

Re: Special national characters in filenames are not handled

Post by anton »

In regards to par2 on linux, you may want to look at this site http://www.chuchusoft.com/par2_tbb/index.html

I've had this running on my system for 1.5 days now, and it seems to work great.
User avatar
shypike
Administrator
Administrator
Posts: 19774
Joined: January 18th, 2008, 12:49 pm

Re: Special national characters in filenames are not handled

Post by shypike »

We distribute chuchusoft's par2 with the Windows and OSX binaries.
Chuchusoft is partially responsible for the current mess.
The "classic" par2 uses 8bit ASCII with unspecified encoding (usually Latin-1).
Chuchusoft decided to use UTF-8 instead without any attempt to
identify the new format and without bothering with backward compatibility.
This is why we need to look inside par2 files to make an educated guess
about which par2 version to use. This also means we not have a
proper solution for Linux where we depend on the par2 that happens to be installed.

For OSX we have another problem. When an upload has been created with
classic par2 on Windows, there's no working solution on OSX.
Both par2 variants will ask OSX about files in Latin-1 format, that OSX doesn't know.
If only chuchusoft's par2 would have the intelligence to translate Latin-1 to UTF-8,
the issue would be resolved.
We would gladly modify chuchusoft's code, if compiling it wasn't a form of black magic.

Having said that, I'll look into the possibilities of the "work-around".
rAf
Moderator
Moderator
Posts: 546
Joined: April 28th, 2008, 2:35 pm
Location: France
Contact:

Re: Special national characters in filenames are not handled

Post by rAf »

shypike wrote: We distribute chuchusoft's par2 with the Windows and OSX binaries.
Chuchusoft is partially responsible for the current mess.
Chuchusoft par2 has been removed from OSX binaries since 0.4.8 (I think)...
User avatar
shypike
Administrator
Administrator
Posts: 19774
Joined: January 18th, 2008, 12:49 pm

Re: Special national characters in filenames are not handled

Post by shypike »

I stand corrected.
Post Reply