Will try the new directunpacker.py now with the "problem child" NZB from earlier.
EDIT:
Ok, tried the problem NZB again with the latest directunpacker.py. Similar result. I did notice that while it completed unpacking and repairing (like last time) this time the SAB process hung at "Moving:..." When I then attempted to restart SAB, it became a defunct zombie process itself with a state of Zl, as opposed to the D state of the unrar process. Now I'm assuming since we're direct unpacking to the completion folder, that the "move" should be no more than a folder rename operation, right?
Also tried to strace both the sabnzbdplus and the unrar process. The unrar process can't (easily) be straced because of the disk wait situation. strace can't be exited in that state, nor killed in the normal fashion. I had to Ctrl-Z and bg to send to background, then kill -9 the background pid. When this happens strace doesn't flush its output buffers to disk, so my output file was empty.
The sabnzbdplus Python process did produce a trace file, but I don't see anything useful in there. Just a bunch of select and futex calls, with some futex calls returning "EAGAIN (Resource Temporarily Unavailable)" from time to time, which may be expected given the nature of futexes/mutexes.
One difference from the above attempt, instead of hanging at "Moving," this time it aborted with a timeout exception which was also output to sabnzbd.error.log. This is presumably from the new addition to directunpacker.py:
Code: Select all
Exception in thread Thread-53:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/share/sabnzbdplus/sabnzbd/directunpacker.py", line 185, in run
self.abort()
File "/usr/share/sabnzbdplus/sabnzbd/decorators.py", line 36, in call_func
return f(*args, **kw)
File "/usr/share/sabnzbdplus/sabnzbd/directunpacker.py", line 403, in abort
timed_communicate(self.active_instance)
File "/usr/share/sabnzbdplus/sabnzbd/getipaddress.py", line 40, in func_wrapper
return async_result.get(max_timeout)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 568, in get
raise TimeoutError
TimeoutError
Exception in thread Thread-53:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/share/sabnzbdplus/sabnzbd/directunpacker.py", line 185, in run
self.abort()
File "/usr/share/sabnzbdplus/sabnzbd/decorators.py", line 36, in call_func
return f(*args, **kw)
File "/usr/share/sabnzbdplus/sabnzbd/directunpacker.py", line 403, in abort
timed_communicate(self.active_instance)
File "/usr/share/sabnzbdplus/sabnzbd/getipaddress.py", line 40, in func_wrapper
return async_result.get(max_timeout)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 568, in get
raise TimeoutError
TimeoutError
Local test worked fine. So apparently this is due to some interaction between this version of Ubuntu+Samba and/or the QNAP version of Samba, SAB, and unrar. The mystery is why it consistently (as in pretty much 100% of the time) does this with DirectUnpack enabled, but never does this when DirectUnpack is disabled. Or when unrar is run manually with the same files and command line. So while the network share is apparently enabling this issue, it's still hard to point the finger at the NAS or Samba directly given that we can't reproduce it outside of DirectUnpack. Something about DirectUnpack is either triggering, or failing to recover from, this issue. And it never occurred with my prior Linux distro (Mint 18.2) and the version of SAB I ran then (don't recall which version, but several releases back).
Just in case it's useful to know:
Server: Ubuntu 18.04.01 fully updated with Samba 4.7.6+dfsg~ubuntu-0ubuntu2.5 (standard Ubuntu repo version)
NAS: QNAP TS-851 running QTS 4.3.6 with its internal Samba 4.4.16, with SMB v3 enabled
Network: Gigabit Ethernet hardwired to switch between server and NAS.
Whew. That's been a lot of time and effort spent by both of us trying to figure this out. I appreciate the help, but we're coming up dry. For now I'm going to disable DirectUnpack again so maybe my server can go more than a couple days without a reboot. Before this my Linux servers could go a year or two between reboots!