Frequent hangs at Quick Repair, and zombie unrar processes

Support for the Debian/Ubuntu package, created by JCFP.
Forum rules
Help us help you:
  • Are you using the latest stable version of SABnzbd? Downloads page.
  • Tell us what system you run SABnzbd on.
  • Adhere to the forum rules.
  • Do you experience problems during downloading?
    Check your connection in Status and Interface settings window.
    Use Test Server in Config > Servers.
    We will probably ask you to do a test using only basic settings.
  • Do you experience problems during repair or unpacking?
    Enable +Debug logging in the Status and Interface settings window and share the relevant parts of the log here using [ code ] sections.
Veteran68
Newbie
Newbie
Posts: 18
Joined: December 26th, 2018, 12:36 am

Frequent hangs at Quick Repair, and zombie unrar processes

Post by Veteran68 » December 26th, 2018, 1:08 am

I've been a SAB user for years, running it on an internal Linux server, across a couple different distributions and package variations. I recently upgraded the server hardware and installed Ubuntu 18.04.1 LTS, and installed SAB from official Ubuntu packages (currently at version 2.3.2+dfsg-1).

Shortly after I started noticing that after every few downloads, it would hang on Repair: Quick Checking and the CPU would spike. Anything in the queue after was also hung either in an unpacking or waiting state. My sysload would stay above 3.00 and sometimes exceed 4.00 with nothing else running on the box. When that happens I can stop SAB with service sabnzbdplus stop but its children unrar processes are not terminated, they're just left in a zombie state. However sysload doesn't seem to recover to an idle state. If I restart the service it will sometimes recover, repair, and complete the download; other times it won't until I reboot the entire system. Sysload typically doesn't settle back down until I reboot, regardless. I then have to go clean up all the extra __UNPACK__ folders, which btw are in my complete folder on my NAS, not in my local incomplete folder (which seems odd).

I saw this similar issue reported in topic #23353 (sorry, as a new member I can't post a direct link yet) and the OP mentioned disabling Direct Unpack to relieve his issue. I've done so and will continue to monitor, but wanted to post this while I was in the middle of troubleshooting it. There's obviously something going on with some users or versions or combinations of settings. I can't remember when I first enabled Direct Unpack, or if it was always enabled, but I wasn't having these recurring issues until Ubuntu 18.04 and this distribution of SAB. I did restore a backup of my configuration from the previous server build, so it would have brought over my existing config that had been running fine before. Unfortunately I don't recall the previous version of SAB that I was running.

Any known issues (hopefully already fixed?) with the version I'm running that could explain this behavior?

Relevant System Info
Ubuntu 18.04.1 LTS (all updates applied)
Intel Core i7 6700K
16GB RAM
Gigabit fiber internet connection
Primary Usenet service: Usenetserver (20 connections)
Secondary Usenet service: FrugalUsenet (50 connections - recent addition, after this issue started, and never have seen it used yet)
Incomplete downloads stored on local HDD
Complete downloads stored on QNAP TS-851 NAS mounted via Samba

User avatar
safihre
Administrator
Administrator
Posts: 3088
Joined: April 30th, 2015, 7:35 am
Location: Switzerland
Contact:

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Post by safihre » December 26th, 2018, 3:14 am

Can you switch to our PPA, this behavior is fixed in later versions (or at least a couple of the reasons for it).
https://sabnzbd.org/wiki/installation/i ... buntu-repo

If it still happens on 2.3.5 or 2.3.6, please enable Debug logging in the Status window and then after it happens again send me the log at [email protected]!

Veteran68
Newbie
Newbie
Posts: 18
Joined: December 26th, 2018, 12:36 am

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Post by Veteran68 » December 26th, 2018, 4:25 pm

Will do, thanks!

Veteran68
Newbie
Newbie
Posts: 18
Joined: December 26th, 2018, 12:36 am

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Post by Veteran68 » December 26th, 2018, 11:00 pm

So I installed 2.3.6 from the PPA. I did not reboot. I re-enabled Direct Unpack like I had it before just to compare apples to apples. Had to whitelist my hostname to get past the DNS check, but once I got that sorted I attempted a download. Same thing happened with the hang at Quick Checking. I stopped and restarted the SAB service and it then completed.

I still have a zombie unrar process, stuck on one of the unpack operations, just like before. I'm going to reboot to clear everything out.

This left me with 2 _UNPACK_ folders in my completed folder, along with the actual completed job. I guess I'm not understanding what the point of an incomplete folder is if it's just going to unpack in the completed folder as it downloads?

When I try again I'll enable debug logs and hope to catch it in the act.

Oh, one difference is that I did enable multi-core par this time, and set the priority to low (-pL switch) hoping that would help with sysload.

User avatar
safihre
Administrator
Administrator
Posts: 3088
Joined: April 30th, 2015, 7:35 am
Location: Switzerland
Contact:

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Post by safihre » December 27th, 2018, 3:03 am

Oke lets get some logs then. They should be able to tell us more.
Incomplete is where the RAR files are downloaded to, from which they are directly unpacked to the Complete folder.

Veteran68
Newbie
Newbie
Posts: 18
Joined: December 26th, 2018, 12:36 am

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Post by Veteran68 » December 27th, 2018, 6:50 pm

I'm working on gathering logs. Behavior is even worse now than before the update. Now one download might work, then the next just hangs at 100% progress bar and never moves into the history queue. Again, restarting the SAB service does not terminate child unrar processes, which can only be cleared up by a reboot, and I'm left with partial _UNPACK_ folders.

I will email the logs and post here when I get it all sorted.

Veteran68
Newbie
Newbie
Posts: 18
Joined: December 26th, 2018, 12:36 am

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Post by Veteran68 » December 27th, 2018, 8:35 pm

Ok, I think this issue is different than my original one. This one looks like maybe a bad NZB/archive, but also related to Direct Unpack not handling things well. The logs show a DirectUnpack error at volume 39 of 49, but nothing is reported in the UI, it just freezes at 100% and 38/49 showing next to the filename. And again it leaves a hung unrar process that doesn't die when SAB is restarted.

So my original problem may actually be fixed, but this one still looks related to DirectUnpack not handling something properly.

I'm emailing you the logs and more details now.

EDIT: Email sent!

User avatar
sander
Release Testers
Release Testers
Posts: 6556
Joined: January 22nd, 2008, 2:22 pm

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Post by sander » December 28th, 2018, 2:24 am

Veteran68 wrote:
December 27th, 2018, 6:50 pm
Again, restarting the SAB service does not terminate child unrar processes, which can only be cleared up by a reboot
You can kill a Zombie process by killing its parent. So ... what is the PPID of the Zombie process?

AFAIK you can find the Zombie and the parent like this:

Code: Select all

[email protected]:~$ ps  xao pid,ppid,pgid,sid,comm,stat | grep Z
 3519  3510  2117  2117 livep <defunct> Z+

[email protected]:~$ ps  xao pid,ppid,pgid,sid,comm,stat | grep 3510
 3510  2126  2117  2117 update-notifier Sl+
 3519  3510  2117  2117 livep <defunct> Z+

Veteran68
Newbie
Newbie
Posts: 18
Joined: December 26th, 2018, 12:36 am

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Post by Veteran68 » December 28th, 2018, 8:25 pm

When the parent dies, zombies revert to PID #1 (init). Killing init then kills your entire session. So you may as well reboot.

User avatar
sander
Release Testers
Release Testers
Posts: 6556
Joined: January 22nd, 2008, 2:22 pm

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Post by sander » December 29th, 2018, 3:46 am

Veteran68 wrote:
December 26th, 2018, 1:08 am
Incomplete downloads stored on local HDD
Complete downloads stored on QNAP TS-851 NAS mounted via Samba
What if you put Complete download on the local HDD too? (And move the downloads afterwards manually)

User avatar
safihre
Administrator
Administrator
Posts: 3088
Joined: April 30th, 2015, 7:35 am
Location: Switzerland
Contact:

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Post by safihre » December 29th, 2018, 4:20 am

I read the logs and responded this also via email:

It took me a bit to understand what I think is going on here:
In the output of UnRar it detects one of the kill reasons, so it wants to kill the unrar process.
It seems, however, that it's unable to kill the process, which we do using the kill() command that sends SIGKILL to the unrar-process.
It then get's locked in this state, causing the queue to try to push it to history but the direct unpacker blocking it.
We don't know why it wants to kill the unrar, because it never gets to the printing of the log because it's stuck at the kill() command.

I am not sure why it cannot kill unrar. We never had problems with this before and it sort of makes me think it's something specific to your setup.
The answers online indicate that kill() always works, but for your system it doesn't seem to work.
I have seen something similar on Windows, where the Unrar process locks up the whole PC when the network drive has a timeout.
Is there a way for you to run SABnzbd in a more "vanilla" way without any load-balancing and with the output folder locally instead of on the NAS?

User avatar
sander
Release Testers
Release Testers
Posts: 6556
Joined: January 22nd, 2008, 2:22 pm

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Post by sander » December 29th, 2018, 9:10 am

Would it be helpful to print the reason / linebuf in a logging.debug()? Good for debugging and statistics.

So right after https://github.com/sabnzbd/sabnzbd/blob ... er.py#L183 and before the self.abort()

Even I have these errors:

Code: Select all

$ grep 'Error in DirectUnpack' ~/.sabnzbd/logs/sabnzbd.log*  | cut -c-127
/home/sander/.sabnzbd/logs/sabnzbd.log.4:2018-06-04 21:37:09,786::INFO::[directunpacker:177] Error in DirectUnpack of Brooklyn.
/home/sander/.sabnzbd/logs/sabnzbd.log.4:2018-07-01 18:43:45,176::INFO::[directunpacker:177] Error in DirectUnpack of 0d8038f7a
/home/sander/.sabnzbd/logs/sabnzbd.log.5:2018-05-19 17:45:46,719::INFO::[directunpacker:177] Error in DirectUnpack of test-with
/home/sander/.sabnzbd/logs/sabnzbd.log.OUD:2017-11-10 18:21:10,028::INFO::[directunpacker:177] Error in DirectUnpack of Best.Op
As background: these words in the unrar output trigger the abortion of direct unpack:

Code: Select all

'ERROR: ', 'Cannot create', 'in the encrypted file', 'CRC failed',
                                 'checksum failed', 'You need to start extraction from a previous volume',
                                 'password is incorrect', 'Write error', 'checksum error',
                                 'start extraction from a previous volume'

Veteran68
Newbie
Newbie
Posts: 18
Joined: December 26th, 2018, 12:36 am

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Post by Veteran68 » December 29th, 2018, 1:01 pm

I replied in the email, but will copy here for full disclosure.

I can try with a local completion folder. I'm not running load balanced -- if you're referring to the multiple host names, those are just various DNS names for the same server. I've been reworking my internal+external DNS hostnames to get everything behind LetsEncrypt SSL certs, but haven't been cleaning up the old ones yet.

Yeah I've never been able to kill those unrar processes from the cmdline either, even before I restarted SAB and they became zombies. The only way I know of a process not responding to SIGKILL is if the process is blocked in a kernel disk wait state, which I did not think to check for. It's possible there's an I/O block going on -- but why only with THIS particular download? I've downloaded several since this one and have not reproduced the error. I'm currently sitting at a sysload of 0.01,0.01,0.00 after downloading a half dozen other nzbs, so haven't been able to recreate the issue with any other download so far.

I guess if it continues I could always complete downloads locally and then script the move to the NAS. Which was kinda how I assumed SAB would do it, I didn't realize it unpacked and assembled pieces directly to the completion folder. Is there a switch to control this behavior? I thought it might be a side effect of Direct Unpack, but noticed even with DU disabled it still creates output in the completion folder while it's still downloading. I think I'd prefer to see all work done in the Incomplete folder, and only the final result moved to the complete folder.

Thanks for the help, this has been an interesting if frustrating issue to troubleshoot!

Veteran68
Newbie
Newbie
Posts: 18
Joined: December 26th, 2018, 12:36 am

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Post by Veteran68 » December 29th, 2018, 1:07 pm

sander wrote:
December 29th, 2018, 3:46 am
Veteran68 wrote:
December 26th, 2018, 1:08 am
Incomplete downloads stored on local HDD
Complete downloads stored on QNAP TS-851 NAS mounted via Samba
What if you put Complete download on the local HDD too? (And move the downloads afterwards manually)
Yeah, that's what I proposed to do if this behavior keeps occurring. So far it seems to have been limited to this one download, which makes me question that it's related to I/O blocking when unpacking to the NAS (the only explanation for SIGKILL to not work). Why would one volume of one download result in I/O block every time, but not any others (so far)? I'm guessing some sort of file corruption that's confusing Samba or the NAS so it just hangs.

I do agree that it would be helpful if SAB could log the actual error returned from unrar. It could at least answer some of these questions about what actually happened. It's obvious the code recognized the error. Which also begs the question: if unrar can report an error, why does it then become I/O blocked? Maybe trying to cleanup after itself? Who knows.

User avatar
sander
Release Testers
Release Testers
Posts: 6556
Joined: January 22nd, 2008, 2:22 pm

Re: Frequent hangs at Quick Repair, and zombie unrar processes

Post by sander » December 29th, 2018, 1:22 pm

In directunpacker.py, I've now put this line (see the "SJ:", my initials)

Code: Select all

            # Error? Let PP-handle it
            logging.debug('SJ: linebuf is now %s', linebuf)
            if linebuf.endswith(('ERROR: ', 'Cannot create', 'in the encrypted file', 'CRC failed',
                                 'checksum failed', 'You need to start extraction from a previous volume',
                                 'password is incorrect', 'Write error', 'checksum error',
                                 'start extraction from a previous volume')):
                logging.info('Error in DirectUnpack of %s', self.cur_setname)
                self.abort()
And that gives a lot of logging ... a new line for each new character.

Are you willing & able to put it that line too?
If so, try again with the NAS.

Post Reply