Page 3 of 3

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Posted: January 3rd, 2019, 8:34 pm
by Veteran68
Just tried your test NZB. It completed a my_1GB.bin file of 1,048,576,000 bytes containing all 0 (null) bytes. No reported errors, although it did hitch and redownload the last few percent before going into repair mode. But no hung unrar process left behind.

Will try the new directunpacker.py now with the "problem child" NZB from earlier.

EDIT:

Ok, tried the problem NZB again with the latest directunpacker.py. Similar result. I did notice that while it completed unpacking and repairing (like last time) this time the SAB process hung at "Moving:..." When I then attempted to restart SAB, it became a defunct zombie process itself with a state of Zl, as opposed to the D state of the unrar process. Now I'm assuming since we're direct unpacking to the completion folder, that the "move" should be no more than a folder rename operation, right?

Image

Also tried to strace both the sabnzbdplus and the unrar process. The unrar process can't (easily) be straced because of the disk wait situation. strace can't be exited in that state, nor killed in the normal fashion. I had to Ctrl-Z and bg to send to background, then kill -9 the background pid. When this happens strace doesn't flush its output buffers to disk, so my output file was empty. :(

The sabnzbdplus Python process did produce a trace file, but I don't see anything useful in there. Just a bunch of select and futex calls, with some futex calls returning "EAGAIN (Resource Temporarily Unavailable)" from time to time, which may be expected given the nature of futexes/mutexes.

One difference from the above attempt, instead of hanging at "Moving," this time it aborted with a timeout exception which was also output to sabnzbd.error.log. This is presumably from the new addition to directunpacker.py:

Code: Select all

Exception in thread Thread-53:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/share/sabnzbdplus/sabnzbd/directunpacker.py", line 185, in run
    self.abort()
  File "/usr/share/sabnzbdplus/sabnzbd/decorators.py", line 36, in call_func
    return f(*args, **kw)
  File "/usr/share/sabnzbdplus/sabnzbd/directunpacker.py", line 403, in abort
    timed_communicate(self.active_instance)
  File "/usr/share/sabnzbdplus/sabnzbd/getipaddress.py", line 40, in func_wrapper
    return async_result.get(max_timeout)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 568, in get
    raise TimeoutError
TimeoutError

Exception in thread Thread-53:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/share/sabnzbdplus/sabnzbd/directunpacker.py", line 185, in run
    self.abort()
  File "/usr/share/sabnzbdplus/sabnzbd/decorators.py", line 36, in call_func
    return f(*args, **kw)
  File "/usr/share/sabnzbdplus/sabnzbd/directunpacker.py", line 403, in abort
    timed_communicate(self.active_instance)
  File "/usr/share/sabnzbdplus/sabnzbd/getipaddress.py", line 40, in func_wrapper
    return async_result.get(max_timeout)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 568, in get
    raise TimeoutError
TimeoutError
It still left the blocked unrar process though. I should also note that in every iteration of this test tonight with the new directunpacker.py, the SAB process would go zombie and refuse to be killed. The stop command on the service would take quite awhile to come back, and didn't stop the process. I'm assuming it could be related to these latest changes where you're trying to implement a timeout and handle things cleanly, but maybe it's now insisting on waiting on the unrar child process which is hung, so it also hangs?

Local test worked fine. So apparently this is due to some interaction between this version of Ubuntu+Samba and/or the QNAP version of Samba, SAB, and unrar. The mystery is why it consistently (as in pretty much 100% of the time) does this with DirectUnpack enabled, but never does this when DirectUnpack is disabled. Or when unrar is run manually with the same files and command line. So while the network share is apparently enabling this issue, it's still hard to point the finger at the NAS or Samba directly given that we can't reproduce it outside of DirectUnpack. Something about DirectUnpack is either triggering, or failing to recover from, this issue. And it never occurred with my prior Linux distro (Mint 18.2) and the version of SAB I ran then (don't recall which version, but several releases back).

Just in case it's useful to know:
Server: Ubuntu 18.04.01 fully updated with Samba 4.7.6+dfsg~ubuntu-0ubuntu2.5 (standard Ubuntu repo version)
NAS: QNAP TS-851 running QTS 4.3.6 with its internal Samba 4.4.16, with SMB v3 enabled
Network: Gigabit Ethernet hardwired to switch between server and NAS.

Whew. That's been a lot of time and effort spent by both of us trying to figure this out. I appreciate the help, but we're coming up dry. For now I'm going to disable DirectUnpack again so maybe my server can go more than a couple days without a reboot. :) Before this my Linux servers could go a year or two between reboots!

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Posted: January 10th, 2019, 5:38 pm
by Veteran68
Guess we ran out of ideas, eh? :)

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Posted: January 11th, 2019, 12:47 am
by safihre
Seems so. It's a problem in Unrar together with this filesystem that we can't get it out of this state.
At least now I've build a warning when it happens.

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Posted: January 11th, 2019, 5:29 pm
by Veteran68
Well as I mentioned, Direct Unpack is not a must-have for me. It's nice to get a completed unpack almost immediately after the download is finished, but I'm fine with waiting to unpack serially after the download. It's generally not adding more than a minute or two to the whole process anyway. And it's far less painful than when this issue occurs with the frequency that it has been occuring.

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Posted: January 12th, 2019, 4:54 am
by safihre
In the new 2.3.7 that I just released it will no longer freeze up and will display a warning if it can't kill unrar, for these special situations.
Maybe if we get enough users with this problem we can find out the exact cause.