Bugs: UnicodeDecodeError: invalid continuation byte

Questions and bug reports for Beta releases should be posted here.
Forum rules
Help us help you:
  • Tell us what system you run SABnzbd on.
  • Adhere to the forum rules.
  • Do you experience problems during downloading?
    Check your connection in Status and Interface settings window.
    Use Test Server in Config > Servers.
    We will probably ask you to do a test using only basic settings.
  • Do you experience problems during repair or unpacking?
    Enable +Debug logging in the Status and Interface settings window and share the relevant parts of the log here using [ code ] sections.
apollo13
Newbie
Newbie
Posts: 8
Joined: April 10th, 2016, 10:20 am

Re: Bugs: UnicodeDecodeError: invalid continuation byte

Post by apollo13 »

Set the system to UTF-8, restarted the processes, even rebooted the whole system but had still no luck:

Code: Select all

# locale
LANG=de_DE.UTF-8
LC_CTYPE="de_DE.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_ALL=
Postprocessing still gives the same error:

Code: Select all

2016-04-11 21:32:50,362::INFO::[postproc:520] Traceback: 
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/sabnzbd/postproc.py", line 354, in process_job
    unpack_error, newfiles = unpack_magic(nzo, short_path(workdir), short_complete, flag_delete, one_folder, (), (), (), (), ())
  File "/usr/local/lib/python2.7/site-packages/sabnzbd/newsunpack.py", line 270, in unpack_magic
    xjoinables, xzips, xrars, xsevens, xts, depth)
  File "/usr/local/lib/python2.7/site-packages/sabnzbd/newsunpack.py", line 208, in unpack_magic
    xjoinables, xzips, xrars, xsevens, xts = build_filelists(workdir, workdir_complete)
  File "/usr/local/lib/python2.7/site-packages/sabnzbd/newsunpack.py", line 1616, in build_filelists
    for root, dirs, files in os.walk(workdir_complete):
  File "/usr/local/lib/python2.7/os.py", line 286, in walk
    if isdir(join(top, name)):
  File "/usr/local/lib/python2.7/posixpath.py", line 73, in join
    path += '/' + b
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 36: invalid continuation byte
touch is working as expected:

Code: Select all

# touch "⡍⠜⠇⠑⠹ ⠺⠁⠎ ⠁⠎ ⠙⠑⠁⠙ ⠁⠎ ⠁ "
# ls
⡍⠜⠇⠑⠹ ⠺⠁⠎ ⠁⠎ ⠙⠑⠁⠙ ⠁⠎ ⠁
BTW, thanks for your help and quick replies!
User avatar
sander
Release Testers
Release Testers
Posts: 8830
Joined: January 22nd, 2008, 2:22 pm

Re: Bugs: UnicodeDecodeError: invalid continuation byte

Post by sander »

@apollo13

Can you do this:

Code: Select all

locale -a
Find a language with utf8 in it. Let's assume you have "en_US.utf8" installed. Then stop SABnzbd, and run SABnzbd like this:

Code: Select all

LANG=en_US.utf8 && ./SABnzbd.py
(or the command you use to start SABnzbd from the command line)
Then re-download a problematic NZB (with umlauts in one of the files). Let us know the result.


On my Linux:

Code: Select all

$ locale -a
C
C.UTF-8
POSIX
en_AG
en_AG.utf8
en_AU.utf8
en_BW.utf8
en_CA.utf8
en_DK.utf8
en_GB.utf8
en_HK.utf8
en_IE.utf8
en_IN
en_IN.utf8
en_NG
en_NG.utf8
en_NZ.utf8
en_PH.utf8
en_SG.utf8
en_US.utf8
en_ZA.utf8
en_ZM
en_ZM.utf8
en_ZW.utf8
nl_NL.utf8
apollo13
Newbie
Newbie
Posts: 8
Joined: April 10th, 2016, 10:20 am

Re: Bugs: UnicodeDecodeError: invalid continuation byte

Post by apollo13 »

@sander

I've done the following.
I'm using de_DE:

Code: Select all

locale -a
...
de_DE.UTF-8
...
my locale

Code: Select all

# locale
LANG=de_DE.UTF-8
LC_CTYPE="de_DE.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_ALL=
starting sabnzbd

Code: Select all

LANG=de_DE.UTF-8 && /usr/local/bin/SABnzbd.py -f /usr/local/sabnzbd/sabnzbd.ini
with the following result (snipped), still no luck.

Code: Select all

2016-04-11 21:59:05,716::ERROR::[postproc:518] Post Processing Failed for MAG.Computerwelt.-.April.2016 ()
2016-04-11 21:59:05,717::INFO::[postproc:520] Traceback: 
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/sabnzbd/postproc.py", line 354, in process_job
    unpack_error, newfiles = unpack_magic(nzo, short_path(workdir), short_complete, flag_delete, one_folder, (), (), (), (), ())
  File "/usr/local/lib/python2.7/site-packages/sabnzbd/newsunpack.py", line 270, in unpack_magic
    xjoinables, xzips, xrars, xsevens, xts, depth)
  File "/usr/local/lib/python2.7/site-packages/sabnzbd/newsunpack.py", line 208, in unpack_magic
    xjoinables, xzips, xrars, xsevens, xts = build_filelists(workdir, workdir_complete)
  File "/usr/local/lib/python2.7/site-packages/sabnzbd/newsunpack.py", line 1616, in build_filelists
    for root, dirs, files in os.walk(workdir_complete):
  File "/usr/local/lib/python2.7/os.py", line 286, in walk
    if isdir(join(top, name)):
  File "/usr/local/lib/python2.7/posixpath.py", line 73, in join
    path += '/' + b
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 36: invalid continuation byte
User avatar
sander
Release Testers
Release Testers
Posts: 8830
Joined: January 22nd, 2008, 2:22 pm

Re: Bugs: UnicodeDecodeError: invalid continuation byte

Post by sander »

The problem occurs in this piece of code in sabnzbd/newsunpack.py:

Code: Select all

    if workdir_complete:
        for root, dirs, files in os.walk(workdir_complete):

Code: Select all

UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 36: invalid continuation byte
So I'm assuming workdir_complete is a mix of Unicode and non-Unicode. And that might be caused by SABnzbd concatenating workdir_complete based on a mix of Unicode and non-Unicode strings ... with special characters in it (otherwise I do not expect the error to occur).

So ... do you use special character (like umlauts, ringel-S) in the workdir of SABnzbd?

EDIT:

Code: Select all

# -*- coding: utf-8 -*-

import sys
import os

workdir_complete = "/home/sander/kullie"
for root, dirs, files in os.walk(workdir_complete):
	print root, dirs, files

workdir_complete = "/home/sander/" + u'kullie'
for root, dirs, files in os.walk(workdir_complete):
	print root, dirs, files

Code: Select all

$ python test2.py 
/home/sander/kullie [] ['test2.py', 'test2.py~', 'test-touch.py', 'test-touch.py~']
/home/sander/kullie [] [u'test2.py', u'test2.py~', u'test-touch.py', u'test-touch.py~']

Code: Select all

$ touch "schön.txt"
$ python test2.py 
/home/sander/kullie [] ['test2.py', 'sch\xc2\xb4\xc2\xa8\xc3\xb6n.txt', 'test2.py~', 'test-touch.py', 'sch\xc3\xb6n.txt', 'test-touch.py~']
Traceback (most recent call last):
  File "test2.py", line 11, in <module>
    for root, dirs, files in os.walk(workdir_complete):
  File "/usr/lib/python2.7/os.py", line 286, in walk
    if isdir(join(top, name)):
  File "/usr/lib/python2.7/posixpath.py", line 73, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 4: ordinal not in range(128)
Interesting ... NO special character in the directory name, only in a filename ...

With the umlaut-file still present:

Code: Select all

sander@streamer1504:~/kullie$ LANG=en_US.utf8 && python test2.py
/home/sander/kullie [] [u'test2.py', u'test2.py~', u'test-touch.py', u'sch\xf6n.txt', u'test-touch.py~']

Code: Select all

sander@streamer1504:~/kullie$ LANG= && python test2.py
Traceback (most recent call last):
  File "test2.py", line 13, in <module>
    for root, dirs, files in os.walk(workdir_complete):
  File "/usr/lib/python2.7/os.py", line 286, in walk
    if isdir(join(top, name)):
  File "/usr/lib/python2.7/posixpath.py", line 73, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)
sander@streamer1504:~/kullie$ 
apollo13
Newbie
Newbie
Posts: 8
Joined: April 10th, 2016, 10:20 am

Re: Bugs: UnicodeDecodeError: invalid continuation byte

Post by apollo13 »

from sabnzbd.ini:

Code: Select all

download_dir = Downloads/incomplete
complete_dir = Downloads/complete
absolute path looks like this:

Code: Select all

# pwd
/usr/local/sabnzbd/Downloads/complete
no umlauts or scharfes-S being (ß) used in the directory but only in the downloaded file, which I can't avoid.

Let me know if you need more debugging output!
User avatar
sander
Release Testers
Release Testers
Posts: 8830
Joined: January 22nd, 2008, 2:22 pm

Re: Bugs: UnicodeDecodeError: invalid continuation byte

Post by sander »

Some hardcore stuff, which you --- as a FreeBSD user - should be able to handle ;-)

In sabnzbd/newsunpack.py from line 1615 I added three lines:

Code: Select all

    if workdir_complete:
        import locale
        logging.debug("SJ: locale.getdefaultlocale() is %s", locale.getdefaultlocale())
        logging.debug("SJ: workdir_complete is %s", workdir_complete)
        for root, dirs, files in os.walk(workdir_complete):
Can you do that too, set SAB's logging to +debug, and run SABnzbd again with the problematic NZB?

With

Code: Select all

$ LANG=en_US.utf8 && ./SABnzbd.py
I get

Code: Select all

$ cat ~/.sabnzbd/logs/sabnzbd.log | grep -A1 -B1 SJ

2016-04-11 22:53:46,585::INFO::[newsunpack:233] Unrar finished on /home/sander/Downloads/incomplete/MAG.Computerwelt.-.April.2016.1
2016-04-11 22:53:46,585::DEBUG::[newsunpack:1617] SJ: locale.getdefaultlocale() is ('en_US', 'UTF-8')
2016-04-11 22:53:46,586::DEBUG::[newsunpack:1618] SJ: workdir_complete is /home/sander/Downloads/complete/_UNPACK_MAG.Computerwelt.-.April.2016.4
2016-04-11 22:53:46,587::DEBUG::[newsunpack:1644] build_filelists(): joinables: []
I don't get an error, but I'm interested in your effective/real locale settings in SABnzbd.
Can you post that grep output?
apollo13
Newbie
Newbie
Posts: 8
Joined: April 10th, 2016, 10:20 am

Re: Bugs: UnicodeDecodeError: invalid continuation byte

Post by apollo13 »

found the newsunpack.py here

Code: Select all

/usr/local/lib/python2.7/site-packages/sabnzbd
changed the lines as you said, looked like this afterwards

Code: Select all

    if workdir_complete:
        import locale
        logging.debug("SJ: locale.getdefaultlocale() is %s", locale.getdefaultlocale())
        logging.debug("SJ: workdir_complete is %s", workdir_complete)
        for root, dirs, files in os.walk(workdir_complete):
            for _file in files:
for the sake I changed my shell from zsh to bash (don't know why) and started sabnzbd again

Code: Select all

# bash                                                                           
/usr/local/sabnzbd/logs]# LANG=de_DE.UTF-8 && /usr/local/bin/SABnzbd.py -f /usr/local/sabnzbd/sabnzbd.ini
changed logging to +debug and redownloaded the nzb and grepped:

Code: Select all

# grep SJ sabnzbd.log
2016-04-11 23:16:08,116::DEBUG::[newsunpack:1617] SJ: locale.getdefaultlocale() is ('de_DE', 'UTF-8')
2016-04-11 23:16:08,116::DEBUG::[newsunpack:1618] SJ: workdir_complete is /usr/local/sabnzbd/Downloads/complete/_UNPACK_MAG.Computerwelt.-.April.2016
no error anymore and everything worked as expected (folder is being named correctly and so on) :D

but if I start it the normal FreeBSD way

Code: Select all

service sabnzbd start
the output looks like this

Code: Select all

2016-04-11 23:22:42,596::DEBUG::[newsunpack:1617] SJ: locale.getdefaultlocale() is (None, None)
2016-04-11 23:22:42,597::DEBUG::[newsunpack:1618] SJ: workdir_complete is /usr/local/sabnzbd/Downloads/complete/_UNPACK_MAG.Computerwelt.-.April.2016
locale not set and error in the webfrontend (folder still _UNPACK_something)

Now I have a workaround which is great, thanks alot for that - but is there a way to do it with the RC script? Does anything needed to be changed there? I think this is FreeBSD specific but maybe someone knows ::)

EDIT 1:
If sabnzbd is started with the workaround mentioned above the created file is from the user root and the group _sabnzbd as I started it with - maybe I'll change that later on...
User avatar
sander
Release Testers
Release Testers
Posts: 8830
Joined: January 22nd, 2008, 2:22 pm

Re: Bugs: UnicodeDecodeError: invalid continuation byte

Post by sander »

AFAIK a RC-script is a /bin/sh script (check the shebang in the first line), and AFAIK you should use something like this in a /bin/sh thus the RC script:

LANG=de_DE.UTF-8; export LANG

Can you try if that works?

Having said that: I hope SABnzbd itself can solve it and set the LANG in a save way. So far I only know how to detect that UTF-8 is used:

UTF-8, thus good:

Code: Select all

$ LANG=en_US.utf8 && python -c "import locale; print locale.getpreferredencoding(); print locale.getdefaultlocale()" 
UTF-8
('en_US', 'UTF-8')
No UTF-8, so problems in case of special (non-US-ASCII) charachters in file or directory names;

Code: Select all

$ LANG= && python -c "import locale; print locale.getpreferredencoding(); print locale.getdefaultlocale()"
ANSI_X3.4-1968
('en_US', 'ISO8859-1')
EDIT:

SABnzbd.py contains the code below, but I don't know if changing things here helps:

Code: Select all

import locale
import __builtin__
try:
    locale.setlocale(locale.LC_ALL, "")
    __builtin__.__dict__['codepage'] = locale.getlocale()[1] or 'cp1252'
except:
    # Work-around for Python-ports with bad "locale" support
    __builtin__.__dict__['codepage'] = 'cp1252'
Post Reply