Page 3 of 3

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Posted: January 3rd, 2019, 8:34 pm
by Veteran68
Just tried your test NZB. It completed a my_1GB.bin file of 1,048,576,000 bytes containing all 0 (null) bytes. No reported errors, although it did hitch and redownload the last few percent before going into repair mode. But no hung unrar process left behind.

Will try the new directunpacker.py now with the "problem child" NZB from earlier.

EDIT:

Ok, tried the problem NZB again with the latest directunpacker.py. Similar result. I did notice that while it completed unpacking and repairing (like last time) this time the SAB process hung at "Moving:..." When I then attempted to restart SAB, it became a defunct zombie process itself with a state of Zl, as opposed to the D state of the unrar process. Now I'm assuming since we're direct unpacking to the completion folder, that the "move" should be no more than a folder rename operation, right?

Image

Also tried to strace both the sabnzbdplus and the unrar process. The unrar process can't (easily) be straced because of the disk wait situation. strace can't be exited in that state, nor killed in the normal fashion. I had to Ctrl-Z and bg to send to background, then kill -9 the background pid. When this happens strace doesn't flush its output buffers to disk, so my output file was empty. :(

The sabnzbdplus Python process did produce a trace file, but I don't see anything useful in there. Just a bunch of select and futex calls, with some futex calls returning "EAGAIN (Resource Temporarily Unavailable)" from time to time, which may be expected given the nature of futexes/mutexes.

One difference from the above attempt, instead of hanging at "Moving," this time it aborted with a timeout exception which was also output to sabnzbd.error.log. This is presumably from the new addition to directunpacker.py:

Code: Select all

Exception in thread Thread-53:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/share/sabnzbdplus/sabnzbd/directunpacker.py", line 185, in run
    self.abort()
  File "/usr/share/sabnzbdplus/sabnzbd/decorators.py", line 36, in call_func
    return f(*args, **kw)
  File "/usr/share/sabnzbdplus/sabnzbd/directunpacker.py", line 403, in abort
    timed_communicate(self.active_instance)
  File "/usr/share/sabnzbdplus/sabnzbd/getipaddress.py", line 40, in func_wrapper
    return async_result.get(max_timeout)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 568, in get
    raise TimeoutError
TimeoutError

Exception in thread Thread-53:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/share/sabnzbdplus/sabnzbd/directunpacker.py", line 185, in run
    self.abort()
  File "/usr/share/sabnzbdplus/sabnzbd/decorators.py", line 36, in call_func
    return f(*args, **kw)
  File "/usr/share/sabnzbdplus/sabnzbd/directunpacker.py", line 403, in abort
    timed_communicate(self.active_instance)
  File "/usr/share/sabnzbdplus/sabnzbd/getipaddress.py", line 40, in func_wrapper
    return async_result.get(max_timeout)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 568, in get
    raise TimeoutError
TimeoutError
It still left the blocked unrar process though. I should also note that in every iteration of this test tonight with the new directunpacker.py, the SAB process would go zombie and refuse to be killed. The stop command on the service would take quite awhile to come back, and didn't stop the process. I'm assuming it could be related to these latest changes where you're trying to implement a timeout and handle things cleanly, but maybe it's now insisting on waiting on the unrar child process which is hung, so it also hangs?

Local test worked fine. So apparently this is due to some interaction between this version of Ubuntu+Samba and/or the QNAP version of Samba, SAB, and unrar. The mystery is why it consistently (as in pretty much 100% of the time) does this with DirectUnpack enabled, but never does this when DirectUnpack is disabled. Or when unrar is run manually with the same files and command line. So while the network share is apparently enabling this issue, it's still hard to point the finger at the NAS or Samba directly given that we can't reproduce it outside of DirectUnpack. Something about DirectUnpack is either triggering, or failing to recover from, this issue. And it never occurred with my prior Linux distro (Mint 18.2) and the version of SAB I ran then (don't recall which version, but several releases back).

Just in case it's useful to know:
Server: Ubuntu 18.04.01 fully updated with Samba 4.7.6+dfsg~ubuntu-0ubuntu2.5 (standard Ubuntu repo version)
NAS: QNAP TS-851 running QTS 4.3.6 with its internal Samba 4.4.16, with SMB v3 enabled
Network: Gigabit Ethernet hardwired to switch between server and NAS.

Whew. That's been a lot of time and effort spent by both of us trying to figure this out. I appreciate the help, but we're coming up dry. For now I'm going to disable DirectUnpack again so maybe my server can go more than a couple days without a reboot. :) Before this my Linux servers could go a year or two between reboots!

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Posted: January 10th, 2019, 5:38 pm
by Veteran68
Guess we ran out of ideas, eh? :)

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Posted: January 11th, 2019, 12:47 am
by safihre
Seems so. It's a problem in Unrar together with this filesystem that we can't get it out of this state.
At least now I've build a warning when it happens.

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Posted: January 11th, 2019, 5:29 pm
by Veteran68
Well as I mentioned, Direct Unpack is not a must-have for me. It's nice to get a completed unpack almost immediately after the download is finished, but I'm fine with waiting to unpack serially after the download. It's generally not adding more than a minute or two to the whole process anyway. And it's far less painful than when this issue occurs with the frequency that it has been occuring.

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Posted: January 12th, 2019, 4:54 am
by safihre
In the new 2.3.7 that I just released it will no longer freeze up and will display a warning if it can't kill unrar, for these special situations.
Maybe if we get enough users with this problem we can find out the exact cause.

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Posted: February 21st, 2019, 2:14 am
by Pack3tL0ss
FYI I have the same issue running 2.3.7

I don't have anything more useful at this point then you've dug up in this chain, but I can re-create the issue. I've seen the errors in the past but didn't realize it was having the impact it was having.

What I can gather so far... For some reason one of the kill reasons is found (crc error etc) it tries to kill the process but can't. I see the "Unable to stop unrar process".

Found 2 negative impacts:
  • Post Processing hangs, waiting for the processing to complete on the download impacted by the failed kill unrar
  • I suspect this has a negative effect on my post-process script (NZB2Media) for other download jobs that are complete and waiting to be processed for an extended period of time. So I was ending up with a number of

    Code: Select all

    CouchPotato: Failed to post-process - No change in status
    . This is only a theory, but the logs for NZB2Media lead me to believe there may be a time component (No status change after 5 minutes...). I ended up with a number of successfully downloaded and verified files in the complete directory, but it didn't rename or process them... Again, may be unrelated. Just noticed a ton of completed downloads in the "waiting" status, and one process that hit the error just spinning doing nothing.
SAB is running on a Linux VM, all the directories it uses are on an external NAS with the incomplete, complete, and final (renamed etc) dir; all on that NAS mounted NFS.

I can try to grab more data as I have time, for now just wanted to chime in to confirm others are seeing this.

UPDATE
Just looked at SAB had a download hung for 45 minutes, could not stop it from the GUI. Went to shell and successfully killed the child unrar process that was hung successfully from shell (No sudo, but sab and the others all run under the same user I log in with). After that the other download jobs were processed.

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Posted: March 4th, 2019, 3:39 pm
by Noppes123
Just adding that I witnessed the 'Unable to stop the unrar process.' issue as well. Probably related to the fact that it was handling a password protected archive?
Running SABnzbd v2.3.7 in a 'freshly' installed Docker container on Debian 10. It is the second NZB downloaded.

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Posted: March 6th, 2019, 2:36 pm
by safihre
And can you also see that the unrar process is still going inside the Docker? Or is it a false reporting?

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Posted: May 21st, 2019, 12:34 pm
by sfenech
Hello,

I'm also having this same issue with version 2.3.8 [0dd1f64]. I just moved my data to a new server running Ubuntu 18.04.2 LTS. Do you have any updates on a fix for the issue?

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Posted: May 22nd, 2019, 1:26 pm
by safihre
The unrar that never goes away or the warning messages?
It's a bug in Unrar, try to disable Direct Unpack if you want to prevent it. The warning will be less active in 2.3.9.