Frequent hangs at Quick Repair, and zombie unrar processes

Support for the Debian/Ubuntu package, created by JCFP.
Forum rules
Help us help you:
  • Are you using the latest stable version of SABnzbd? Downloads page.
  • Tell us what system you run SABnzbd on.
  • Adhere to the forum rules.
  • Do you experience problems during downloading?
    Check your connection in Status and Interface settings window.
    Use Test Server in Config > Servers.
    We will probably ask you to do a test using only basic settings.
  • Do you experience problems during repair or unpacking?
    Enable +Debug logging in the Status and Interface settings window and share the relevant parts of the log here using [ code ] sections.
Veteran68
Newbie
Newbie
Posts: 18
Joined: December 26th, 2018, 12:36 am

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Post by Veteran68 »

I can add it. But I'd suggest adding it after the conditional that finds the error. In fact I'd add it to the existing logging line, like so:

Code: Select all

logging.info('Error in DirectUnpack of %s: %s',  self.cur_setname, linebuf)
That way it's only logging linebuf when the error occurs, and not on every character loop iteration.

BTW I just noticed this same thing happened on the last download. I had walked away and didn't see it until now. I will see if I can reproduce it with this extra logging enabled. It's definitely hanging the unrar process due to a kernel disk wait state, as evidenced by the D state flag from ps aux. I'd like to get an idea of what specific error is reported, but looks like I'll have to move the completed folder off the NAS.

However I should point out that I've had this same NAS Samba mount setup for years now with SAB, and it's only started with this behavior since I installed Ubuntu 18.04 and SABnzbdplus 2.3.2+, so something is either different with SAB behavior, or maybe with the CIFS support in 18.04 that I haven't run into before.
Veteran68
Newbie
Newbie
Posts: 18
Joined: December 26th, 2018, 12:36 am

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Post by Veteran68 »

Here we go. Reproduced the issue with the download I had emailed about. Here's the new log entry:

Code: Select all

2018-12-29 16:05:40,896::INFO::[directunpacker:184] Error in DirectUnpack of <download>: <path>/VIDEO_TS/VTS_19_1.VOB - checksum error
So now we at least know it's a checksum error in a specific file. But why the I/O block? Is this an unrar bug, a CIFS/Samba bug, or what? That's the mystery now I guess.

BTW I'd suggest this logging change be applied to the main repo the next time a maintainer is in there. It's just a change to a single existing line, so I'd rather not clone the repo to make the change and submit a pull request just for that.
User avatar
sander
Release Testers
Release Testers
Posts: 8812
Joined: January 22nd, 2008, 2:22 pm

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Post by sander »

OK, nice.

Are the files still there afterwards? If so, can you manually run unrar on those file? ... do you get the same "checksum error"?

FWIW:
sabnzbd.log has DEBUG lines "DirectUnpack Unrar output", followed by the unrar output. I'm not sure it's filled out when things go wrong, but can you check anyway?
Veteran68
Newbie
Newbie
Posts: 18
Joined: December 26th, 2018, 12:36 am

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Post by Veteran68 »

Yes the files are still in the incomplete folder. By running unrar t to test the archives, it starts reporting checksum error when it gets to the 38th module. I've copied the latter part of the unrar test results below.

Code: Select all

Testing archive <download>.part34.rar

...         <path>/VIDEO_TS/VTS_17_1.VOB             OK
Testing     <path>/VIDEO_TS/VTS_18_0.BUP             OK
Testing     <path>/VIDEO_TS/VTS_18_0.IFO             OK
Testing     <path>/VIDEO_TS/VTS_18_1.VOB             75%

Testing archive <download>.part35.rar

...         <path>/VIDEO_TS/VTS_18_1.VOB             77%

Testing archive <download>.part36.rar

...         <path>/VIDEO_TS/VTS_18_1.VOB             80%

Testing archive <download>.part37.rar

...         <path>/VIDEO_TS/VTS_18_1.VOB             82%

Testing archive <download>.part38.rar

...         <path>/VIDEO_TS/VTS_18_1.VOB             OK
Testing     <path>/VIDEO_TS/VTS_19_0.BUP             OK
Testing     <path>/VIDEO_TS/VTS_19_0.IFO             OK
Testing     <path>/VIDEO_TS/VTS_19_1.VOB             84%
<path>/VIDEO_TS/VTS_19_1.VOB - checksum error
Unexpected end of archive
<path>/VIDEO_TS/VTS_19_1.VOB : packed data checksum error in volume <download>.part38.rar

Testing archive <download>.part39.rar

Unexpected end of archive
<path>/VIDEO_TS/VTS_19_1.VOB : packed data checksum error in volume <download>.part39.rar

Testing archive <download>.part40.rar


Testing archive <download>.part41.rar

Unexpected end of archive
<path>/VIDEO_TS/VTS_19_1.VOB : packed data checksum error in volume <download>.part41.rar

Testing archive <download>.part42.rar

Testing     <path>/VIDEO_TS/VTS_20_0.BUP             OK
Testing     <path>/VIDEO_TS/VTS_20_0.IFO             OK
Testing     <path>/VIDEO_TS/VTS_20_1.VOB             93%
<path>/VIDEO_TS/VTS_20_1.VOB - checksum error
Unexpected end of archive
<path>/VIDEO_TS/VTS_20_1.VOB : packed data checksum error in volume <download>.part42.rar

Testing archive <download>.part43.rar

Unexpected end of archive
<path>/VIDEO_TS/VTS_20_1.VOB : packed data checksum error in volume <download>.part43.rar

Testing archive <download>.part44.rar

Unexpected end of archive
<path>/VIDEO_TS/VTS_20_1.VOB : packed data checksum error in volume <download>.part44.rar

Testing archive <download>.part45.rar

Testing     <path>/VIDEO_TS/VTS_21_0.BUP             OK
Testing     <path>/VIDEO_TS/VTS_21_0.IFO             OK
Testing     <path>/VIDEO_TS/VTS_21_1.VOB             99%
Cannot find volume <download>.part46.rar
<path>/VIDEO_TS/VTS_21_1.VOB - checksum error
Total errors: 9
My next trial was to manually unpack everything to the NAS mount. Surprisingly, that worked as expected, e.g. unrar x <download.part01> <nas_path> proceeded until it hit the checksum error in the 38th part. It then reported errors as above but returned and did not hang. So it appears that unrar can recover from an error when unpacking to the NAS,, as one would expect. Why is the behavior different when SAB calls unrar? Could it be a result of SAB responding to the error with an immediate SIGKILL before unrar can finish its processing?

I tried the manual unpack again using the same cmdline that SAB uses, e.g.

Code: Select all

/usr/bin/unrar x -vp -idp -o+ -p- /data/sabnzbd/incomplete/<download.part01> <nas_path>
Here's where I thought I may be on to something, because for some reason SAB passes the -vp argument to unrar, which causes unrar to prompt for the next volume of the archive. Why, when it's obviously not being run in interactive mode? [EDIT: I looked at the directunpacker code and see that you're doing this so you can process each volume individually as it arrives, then wait for the next one, and so on. Which I can understand given the point of Direct Unpacking. So that answers the "why?" question] When I run this manually I have to press "C<enter>" for the next volume to be processed. When I get to volume 38, I start getting the errors shown above. However when it gets to part 46, which was reported as not found above, it CONTINUALLY prompts me for it. So on a whim I thought I'd try what SAB is reportedly doing after it gets an error from unrar: kill the unrar process. So while it was still prompting me for part 46, I opened another session and sent kill -KILL <pid>. It killed it just fine. So much for that idea...

Bottom line after all this, I cannot reproduce the I/O block by running unrar manually from the cmdline, even with the exact same cmdline that SAB is running.

And yes, I did confirm that the "DirectUnpack Unrar output" log lines are only generated after successful unpacks. It doesn't get output for these failed ones.
Last edited by Veteran68 on December 30th, 2018, 1:19 am, edited 1 time in total.
Veteran68
Newbie
Newbie
Posts: 18
Joined: December 26th, 2018, 12:36 am

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Post by Veteran68 »

Happened again with another download. :(

This one was slightly different. 90 volumes to unpack, and when I run unrar t against the incomplete files, they all check out with no errors. I did not have debug enabled this time, but I'm going to leave it enabled for awhile from now on since this is happening every several downloads. Still, the logs do show missing files after the last volume is processed, but no other errors prior to that (and again my unrar test showed no errors):

Code: Select all

...
2018-12-30 00:12:16,950::INFO::[directunpacker:272] DirectUnpacked volume 89 for 6acb3224badd4058bfae3fdc8f8b42dc
2018-12-30 00:12:17,071::INFO::[directunpacker:272] DirectUnpacked volume 90 for 6acb3224badd4058bfae3fdc8f8b42dc
2018-12-30 00:12:17,173::INFO::[directunpacker:277] DirectUnpack failed due to missing files 6acb3224badd4058bfae3fdc8f8b42dc
2018-12-30 00:12:17,174::INFO::[directunpacker:381] Aborting DirectUnpack for 6acb3224badd4058bfae3fdc8f8b42dc
...
Also, another slight difference is that the hung unrar process left behind was apparently detached from its cmdline, as it's now represented as [unrar] in the process list, rather than a complete command line.

I've never had this sort of instability with SAB before. I suppose we can't rule out Ubuntu 18.04 as being the culprit. It's becoming quite annoying and making me wonder if I should revert to a vanilla Stretch.

For now I'm going to disable Direct Unpack entirely and see if that resolves it. I don't really care about the longer post-processing time, it's not like I'm sitting and waiting for downloads to finish. I may also look at completing locally and then use a post-processing script to move the files to the NAS, but I feel like I shouldn't have to. It has always worked before without it.

EDIT: Ok, so I disabled Direct Unpack, enabled Debug logs, and rebooted to clear the I/O blocked unrar processes. Let it resume the same download again that hung before (it was still paused in download queue at 100%). It verified and unpacked fine this time and unpacked to the NAS (albeit slower than before since Direct Unpack was off, so it had to do it all at the end). Sysload immediately dropped to 0.00. No missing files despite the error thrown above. It's definitely looking like all of these issues are Direct Unpack related, but it's probably too early to tell for sure. I'm going to watch it over the next several downloads. If I get through 10 or so without issue, that will point to it being DU related, because since this started I don't think I ever got through more than 3-4 downloads without it happening again.
User avatar
safihre
Administrator
Administrator
Posts: 5338
Joined: April 30th, 2015, 7:35 am
Contact:

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Post by safihre »

Great analysis! Some clues in here.

So it seems your latest failure was a similar thing, because the line "DirectUnpack failed due to missing file" is reported when unrar gives the same prompt for the next files multiple times while we don't expect it.
I can imagine the interactive mode to be a problem here, although I would have expected to get this problem to manifest at other users too.
If it's the interactive-mode, maybe you can try this version?
Change-set: https://github.com/sabnzbd/sabnzbd/comm ... 5a7a549855
File: https://raw.githubusercontent.com/sabnz ... npacker.py

Just to be sure, can you check if there's a sabnzbd.error (or whatever it is called, I just know sometimes Ubuntu users send me this file :) ) present? Maybe there's an uncaught thread-crash that we are overlooking.
If you like our support, check our special newsserver deal or donate at: https://sabnzbd.org/donate
User avatar
safihre
Administrator
Administrator
Posts: 5338
Joined: April 30th, 2015, 7:35 am
Contact:

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Post by safihre »

Just same the big red note that's listed on the documenting of the wait() method...
https://docs.python.org/2/library/subpr ... Popen.wait

Oops.. Will make a version that uses communicate() instead..
If you like our support, check our special newsserver deal or donate at: https://sabnzbd.org/donate
Veteran68
Newbie
Newbie
Posts: 18
Joined: December 26th, 2018, 12:36 am

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Post by Veteran68 »

safihre wrote: December 30th, 2018, 4:51 am If it's the interactive-mode, maybe you can try this version?
Change-set: https://github.com/sabnzbd/sabnzbd/comm ... 5a7a549855
File: https://raw.githubusercontent.com/sabnz ... npacker.py
I applied those changes, re-enabled Direct Unpack, and tried the problem download with the checksum error again (the one I originally emailed about). No change, still aborted Direct Unpack due to checksum error, still left the download stuck in the download queue showing 100%, and still left a hung unrar process. Filtering the log for the last several directunpacker entries shows same as before:

Code: Select all

2018-12-30 21:54:33,392::INFO::[directunpacker:272] DirectUnpacked volume 36 for <download>
2018-12-30 21:54:33,547::INFO::[directunpacker:272] DirectUnpacked volume 37 for <download>
2018-12-30 21:54:33,702::INFO::[directunpacker:272] DirectUnpacked volume 38 for <download>
2018-12-30 21:54:34,891::DEBUG::[directunpacker:138] DirectUnpack queued <download>.part40.rar for <download>
2018-12-30 21:54:36,320::DEBUG::[directunpacker:138] DirectUnpack queued <download>.part41.rar for <download>
2018-12-30 21:54:37,827::DEBUG::[directunpacker:138] DirectUnpack queued <download>.part42.rar for <download>
2018-12-30 21:54:39,599::INFO::[directunpacker:184] Error in DirectUnpack of <download>: <path>/VIDEO_TS/VTS_19_1.VOB - checksum error
2018-12-30 21:54:39,599::INFO::[directunpacker:381] Aborting DirectUnpack for <download>
Also, there appears to be an endless loop where SAB forever tries to do something with the download about every 30 seconds or so:

Code: Select all

2018-12-30 22:04:51,533::INFO::[nzbqueue:876] Found idle job <download>
2018-12-30 22:04:51,534::INFO::[nzbqueue:890] Resetting bad trylist for job <download>
2018-12-30 22:04:51,534::INFO::[nzbqueue:782] [sabnzbd.nzbqueue.stop_idle_jobs] Ending job <download>
2018-12-30 22:05:24,566::INFO::[nzbqueue:876] Found idle job <download>
2018-12-30 22:05:24,566::INFO::[nzbqueue:890] Resetting bad trylist for job <download>
2018-12-30 22:05:24,567::INFO::[nzbqueue:782] [sabnzbd.nzbqueue.stop_idle_jobs] Ending job <download>
2018-12-30 22:05:57,598::INFO::[nzbqueue:876] Found idle job <download>
2018-12-30 22:05:57,599::INFO::[nzbqueue:890] Resetting bad trylist for job <download>
2018-12-30 22:05:57,599::INFO::[nzbqueue:782] [sabnzbd.nzbqueue.stop_idle_jobs] Ending job <download>
... ad infinitum ...
I've successfully downloaded probably 8-10 things with DirectUnpack disabled before trying this again, and had seen no issues. So that's a workaround for me at least. It definitely looks like DirectUnpack is buggy or at least fragile in some environments, and could use some bullet-proofing.
Just to be sure, can you check if there's a sabnzbd.error (or whatever it is called, I just know sometimes Ubuntu users send me this file :) ) present? Maybe there's an uncaught thread-crash that we are overlooking.
There was a very old one where it looked like I deleted an incomplete download from underneath SAB, so it threw a file not found exception. Nothing recent or related to this issue though.
User avatar
safihre
Administrator
Administrator
Posts: 5338
Joined: April 30th, 2015, 7:35 am
Contact:

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Post by safihre »

Thanks for sticking with us trying to figure things out :) Already happy new year! Depending when you read this ;)

I read and played around and read more about this.
It seems we need to collect the return-code to avoid zombies, but using wait() might deadlock so we need to check if the program is still alive.
Could you give this one a try?
https://raw.githubusercontent.com/sabnz ... npacker.py

However, this won't fix that unrar is stuck on disk-IO.. Only a way to prevent it locking up SABnzbd. The lockup of unrar must be a bug inside unrar(?).
If you like our support, check our special newsserver deal or donate at: https://sabnzbd.org/donate
Veteran68
Newbie
Newbie
Posts: 18
Joined: December 26th, 2018, 12:36 am

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Post by Veteran68 »

Thanks, and Happy New Year to you too!

First, since I have Debug logging enabled and DirectUnpack disabled, I attempted the problematic download (checksum error) again, watching SAB's behavior closely. After 100% download, it very quickly jumped down to History, then before I could see a status, it jumped back up to the Download queue to get the last 8% or so. I assumed this was after it discovered the checksum error in the last few volumes. Presumably it connected to my secondary service (since I saw a bunch of FrugalUsenet disconnect notices in the log after), finished the download, then it again dropped to History, repaired, and unpacked itself to the NAS. All as expected, no UI errors, no hangs, and sysload settled back down to at or near 0.00.
safihre wrote: December 31st, 2018, 5:55 am I read and played around and read more about this.
It seems we need to collect the return-code to avoid zombies, but using wait() might deadlock so we need to check if the program is still alive.
Could you give this one a try?
https://raw.githubusercontent.com/sabnz ... npacker.py
So I then copied this latest directunpacker.py, and re-enabled DirectUnpack. Removed the previous download from the queue and the NAS, restarted SAB just to be sure it picked up all the changes, and tried yet again.

It looks like we have progress! Although not completely fixed. This time SAB was able to repair and unpack. However it STILL leaves a hung unrar process in disk wait, which continues to consume CPU load (syslog sits just above 1.00 afterwards). So this change appears to let directunpacker recover and repair & finish unpacking. But it's still leaving these unrar processes stuck in disk wait.
However, this won't fix that unrar is stuck on disk-IO.. Only a way to prevent it locking up SABnzbd. The lockup of unrar must be a bug inside unrar(?).
Maybe, but why does this never happen outside of direct unpacking, either manually from the cmdline or when direct unpack is disabled? It's still unpacking to the same place on the NAS. It apparently has something to do with the interaction between SAB and unrar running into an error while direct unpacking. But not with direct unpacking disabled. And if no errors are encountered, direct unpacking seems to work fine.

BTW I should also note a couple of other inconvenient side-effect of this behavior. First, the hung unrar processes generate kernel console errors that appear over the console (which I hate, as it corrupts the text console). Another is that rebooting (necessary to clear up the hung unrar processes that wind up a child of the system init parent) takes forever, because during shutdown it tries to kill the stuck unrar process(es) and waits a really long time before timing out and giving up on them.

On the plus side, having gigabit internet bandwidth really makes the download troubleshooting a lot easier to repeat over and over, LOL.
User avatar
safihre
Administrator
Administrator
Posts: 5338
Joined: April 30th, 2015, 7:35 am
Contact:

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Post by safihre »

Well, that's not really a solution then.
So to be complete:
- Does it also happen when the Complete is local?
- Manually running the same command SAB runs with the interactive mode, you are able to kill the process after it hangs on the checksum errors?
- Are you able to do an strace to see what it is actually hang on? Is it disk-IO or maybe user-input?
If you like our support, check our special newsserver deal or donate at: https://sabnzbd.org/donate
User avatar
sander
Release Testers
Release Testers
Posts: 8812
Joined: January 22nd, 2008, 2:22 pm

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Post by sander »

Veteran68 wrote: December 31st, 2018, 2:28 pm
However, this won't fix that unrar is stuck on disk-IO.. Only a way to prevent it locking up SABnzbd. The lockup of unrar must be a bug inside unrar(?).
Maybe, but why does this never happen outside of direct unpacking, either manually from the cmdline or when direct unpack is disabled? It's still unpacking to the same place on the NAS. It apparently has something to do with the interaction between SAB and unrar running into an error while direct unpacking. But not with direct unpacking disabled. And if no errors are encountered, direct unpacking seems to work fine.
That's weird indeed. So we have this:

Working OK:
* regular SAB-unpack/unrar afterwards with NAS
* manual unrar with NAS

UNCLEAR ... suggested by Safihre and me, but not tested by you?
* Direct Unpack with Complete on local disk

Not working:
* Direct Unpack with NAS

Correct?

If so ... what could be the difference? Some shots:
* different unrar call?
* file not yet finished (or locked?) on NAS at the moment DirectUnpack starts?
* handling of the unrar output different?
* ... ?
User avatar
sander
Release Testers
Release Testers
Posts: 8812
Joined: January 22nd, 2008, 2:22 pm

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Post by sander »

Hi Veteran68 ... are you still there?

On top of the question above, are you willing & able to test a 1GB post NZB, with DirectUnpack enabled.
The third rar will go wrong, so DirectUnpack should detect and abort ... so I wonder how your setup will behave.

NZB is here: https://raw.githubusercontent.com/sande ... %20rar.nzb

FWIW: In my setup (Ubuntu, Synology NAS) I can not reproduce your problem (Zombie process).
Veteran68
Newbie
Newbie
Posts: 18
Joined: December 26th, 2018, 12:36 am

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Post by Veteran68 »

Sorry, yes I'm here. Just really busy the past couple days and returned to work today from an extended 2.5 week holiday vacation, so I've got a lot of catching up to do. :)

I think you all have it correct. To recap:
  • Regular unpack works fine with NAS
  • Manual unrar using same cmdline as SAB works fine with NAS
  • When manually unpacking in interactive mode, while it waits for me to (C)ontinue or (Q)uit, I can kill the unrar process from another session and it kills cleanly, leaving no zombie/blocked process
  • Have not run strace yet, but I can. I am almost positive it's in disk wait as user input from stdin shouldn't cause an uninterruptible sleep state, while disk wait always will. The previous point about killing unrar while it's waiting for my input should prove this out, but I can attempt to confirm with strace if it helps.
I do think I tried local unpacking at one point but don't recall the result. I was so focused on getting it to work again with the NAS and not having to script a post-process move that I didn't go much further with it. It's working "well enough" with regular unpacking that it wasn't worth the effort, but to help troubleshoot this further I can certainly try again (because when DU works it does make the process seem snappier). It's entirely possible (even probable?) that the Samba mount is contributing to the issue somehow, but it's still a mystery as to why it works fine without DirectUnpack enabled (even in manual interactive mode). Not being able to reproduce the issue outside of a DirectUnpack scenario leaves me scratching my head.
sander wrote: January 2nd, 2019, 1:12 pm Hi Veteran68 ... are you still there?

On top of the question above, are you willing & able to test a 1GB post NZB, with DirectUnpack enabled.
The third rar will go wrong, so DirectUnpack should detect and abort ... so I wonder how your setup will behave.

NZB is here: https://raw.githubusercontent.com/sande ... %20rar.nzb

FWIW: In my setup (Ubuntu, Synology NAS) I can not reproduce your problem (Zombie process).
When I get home this evening I will try this and post back.
User avatar
safihre
Administrator
Administrator
Posts: 5338
Joined: April 30th, 2015, 7:35 am
Contact:

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Post by safihre »

I have another version to try:
https://github.com/sabnzbd/sabnzbd/blob ... npacker.py

And of course the questions about strace and if it works locally ;)
If you like our support, check our special newsserver deal or donate at: https://sabnzbd.org/donate
Post Reply