Thanks, and Happy New Year to you too!
First, since I have Debug logging enabled and DirectUnpack disabled, I attempted the problematic download (checksum error) again, watching SAB's behavior closely. After 100% download, it very quickly jumped down to History, then before I could see a status, it jumped back up to the Download queue to get the last 8% or so. I assumed this was after it discovered the checksum error in the last few volumes. Presumably it connected to my secondary service (since I saw a bunch of FrugalUsenet disconnect notices in the log after), finished the download, then it again dropped to History, repaired, and unpacked itself to the NAS. All as expected, no UI errors, no hangs, and sysload settled back down to at or near 0.00.
safihre wrote: ↑December 31st, 2018, 5:55 am
I read and played around and read more about this.
It seems we need to collect the return-code to avoid zombies, but using wait() might deadlock so we need to check if the program is still alive.
Could you give this one a try?
https://raw.githubusercontent.com/sabnz ... npacker.py
So I then copied this latest directunpacker.py, and re-enabled DirectUnpack. Removed the previous download from the queue and the NAS, restarted SAB just to be sure it picked up all the changes, and tried yet again.
It looks like we have progress! Although not completely fixed. This time SAB was able to repair and unpack. However it STILL leaves a hung unrar process in disk wait, which continues to consume CPU load (syslog sits just above 1.00 afterwards). So this change appears to let directunpacker recover and repair & finish unpacking. But it's still leaving these unrar processes stuck in disk wait.
However, this won't fix that unrar is stuck on disk-IO.. Only a way to prevent it locking up SABnzbd. The lockup of unrar must be a bug inside unrar(?).
Maybe, but why does this never happen outside of direct unpacking, either manually from the cmdline or when direct unpack is disabled? It's still unpacking to the same place on the NAS. It apparently has something to do with the interaction between SAB and unrar running into an error while direct unpacking. But not with direct unpacking disabled. And if no errors are encountered, direct unpacking seems to work fine.
BTW I should also note a couple of other inconvenient side-effect of this behavior. First, the hung unrar processes generate kernel console errors that appear over the console (which I hate, as it corrupts the text console). Another is that rebooting (necessary to clear up the hung unrar processes that wind up a child of the system init parent) takes forever, because during shutdown it tries to kill the stuck unrar process(es) and waits a really long time before timing out and giving up on them.
On the plus side, having gigabit internet bandwidth really makes the download troubleshooting a lot easier to repeat over and over, LOL.