Page 1 of 1

High wakeups/second count (0.7.16)

Posted: October 4th, 2013, 3:11 pm
by doctorow
Hi,

I noticed very high wakeups per second for sabnzbd running in background. The process is completely idle (no downloads or whatsoever) and no client is connected to the webbrowser. Yet, the number of wakeups is around 50 per second on my system (for any other idle process on this machine the number is either 0.05 or 0.10).

I did a quick strace on the idle sabnzd process and noticed a crazy amount of polling:
[pid 3474] 20:00:05 poll([{fd=8, events=POLLIN}], 1, 1000 <unfinished ...>
[pid 3504] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3504] 20:00:05 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...>
[pid 3473] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3473] 20:00:05 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...>
[pid 3504] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3504] 20:00:05 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...>
[pid 3473] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3473] 20:00:05 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...>
[pid 3504] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3504] 20:00:05 select(0, NULL, NULL, NULL, {0, 9052}) = 0 (Timeout)
[pid 3504] 20:00:05 select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout)
[pid 3504] 20:00:05 select(0, NULL, NULL, NULL, {0, 2000}) = 0 (Timeout)
[pid 3504] 20:00:05 select(0, NULL, NULL, NULL, {0, 4000}) = 0 (Timeout)
[pid 3472] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3504] 20:00:05 select(0, NULL, NULL, NULL, {0, 8000} <unfinished ...>
[pid 3472] 20:00:05 select(0, NULL, NULL, NULL, {3, 0} <unfinished ...>
[pid 3473] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3473] 20:00:05 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...>
[pid 3509] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3509] 20:00:05 select(0, NULL, NULL, NULL, {1, 0} <unfinished ...>
[pid 3504] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3504] 20:00:05 select(0, NULL, NULL, NULL, {0, 16000}) = 0 (Timeout)
[pid 3504] 20:00:05 select(0, NULL, NULL, NULL, {0, 32000} <unfinished ...>
[pid 3473] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3473] 20:00:05 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...>
[pid 3504] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3504] 20:00:05 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...>
[pid 3473] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3473] 20:00:05 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...>
[pid 3504] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3504] 20:00:05 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...>
[pid 3473] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3473] 20:00:05 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...>
[pid 3504] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3504] 20:00:05 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...>
[pid 3473] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3473] 20:00:05 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...>
[pid 3504] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3504] 20:00:05 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...>
[pid 3473] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3473] 20:00:05 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...>
[pid 3504] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3504] 20:00:05 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...>
[pid 3473] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3473] 20:00:05 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...>
[pid 3504] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3504] 20:00:05 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...>
[pid 3473] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3473] 20:00:05 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...>
[pid 3504] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3504] 20:00:05 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...>
[pid 3473] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3473] 20:00:05 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...>
[pid 3504] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3504] 20:00:05 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...>
[pid 3473] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3473] 20:00:05 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...>
[pid 3504] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3504] 20:00:05 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...>
[pid 3473] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3473] 20:00:05 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...>
[pid 3504] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3504] 20:00:05 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...>
[pid 3473] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3473] 20:00:05 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...>
[pid 3504] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3504] 20:00:05 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...>
[pid 3473] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3473] 20:00:05 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...>
[pid 3504] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3504] 20:00:05 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...>
[pid 3473] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3473] 20:00:05 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...>
[pid 3504] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3504] 20:00:05 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...>
[pid 3473] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3473] 20:00:05 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...>
[pid 3504] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3504] 20:00:05 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...>
[pid 3473] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3473] 20:00:05 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...>
[pid 3504] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3504] 20:00:05 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...>
[pid 3473] 20:00:05 <... select resumed> ) = 0 (Timeout)
[pid 3473] 20:00:05 select(0, NULL, NULL, NULL, {0, 50000} <unfinished ...>
etc.

I even tried to limit various options that could be the culprit here, including minimizing logging output, and making sure that any configurable polling number is at least 60s (such as queue file detecting) without any effect.

Any ideas? ???

Doc

Re: High wakeups/second count (0.7.16)

Posted: October 4th, 2013, 4:04 pm
by doctorow
Btw, a "select(0, NULL, NULL, NULL,..." in python probably means it's in time.sleep(). It seems like there is a particular loop here responsible for calling the sleep function repeatedly within one second. I understand why that would be needed if sabnzbd was busy (for example the decoder has a sleep(0.001) within its decode class), but why should it be the case whensabnzbd is completely idle? Looks like a bug to me.

Re: High wakeups/second count (0.7.16)

Posted: October 5th, 2013, 10:00 am
by shypike
I haven't seen that behaviour on any system yet.
There is some polling going on, but idle time should be below 5% cpu load.
Does restart solve this issue?

Re: High wakeups/second count (0.7.16)

Posted: October 5th, 2013, 11:42 am
by doctorow
Hi shypike,

it doesn't affect CPU usage as much; rather, it prevents the CPU from going into idle. Restarting wouldn't solve this issue.

You can use the powertop command to see the number of wakeups per second for every running process. For instance, I have nginx on the same box, and the number of wakeups for its process is zero (nginx is configured to use epoll). Output:

Code: Select all

**Overview of Software Power Consumers**, 

Usage,  Wakeups, GPU ops, Disk IO, GFX Wakeups, Category, Description, 
  1.2 ms/s,  45.2, , , , Process, /usr/bin/python -OO /usr/bin/sabnzbdplus --daemon --server 192.168.0.212:80 , 
134.0 us/s,   8.6, , , , Interrupt, [1] timer(softirq), 
120.3 us/s,   2.7, , , , Interrupt, [3] net_rx(softirq), 
  0.7 ms/s,   1.8, , , , Process, /usr/sbin/smbd -D , 
 53.0 us/s,   1.0, , , , Process, [events/1], 
 30.0 us/s,   1.0, , , , Process, [events/0], 
 22.7 us/s,   0.6, , , , Process, init [2] , 
437.8 us/s,  0.10, , , , Process, powertop --csv test , 
438.3 us/s, , , , , Interrupt, [0] timer/0, 
 14.7 us/s,  0.15, , , , Process, qmgr -l -t fifo -u , 
  1.7 us/s,  0.15, , , , Process, [sync_supers], 
  1.0 us/s,  0.15, , , , Process, [ksoftirqd/1], 
 59.7 us/s,  0.10, , , , Process, [kswapd0], 
 16.1 us/s,  0.10, , , , Process, /usr/sbin/nmbd -D , 
  3.4 us/s,  0.10, , , , Process, /usr/sbin/openvpn --writepid /var/run/openvpn.server.pid --daemon ovpn-server --cd /etc/openvpn --config /etc/openvpn/server.co, 
  1.4 us/s,  0.10, , , , Process, [flush-8:0], 
  9.8 us/s,  0.05, , , , Process, /usr/lib/postfix/master , 
  2.0 us/s,  0.05, , , , Process, [khungtaskd], 
  1.5 us/s,  0.05, , , , Process, [bdi-default], 
 47.1 us/s, , , , , Interrupt, [9] RCU(softirq), 
 38.5 us/s, , , , , Process, [kondemand/1], 
 34.6 us/s, , , , , Process, [kondemand/0], 
 15.8 us/s, , , , , Interrupt, [7] sched(softirq), 
 12.2 us/s, , , , , Interrupt, [27] eth1, 
  0.8 us/s, , , , , Interrupt, [8] hrtimer(softirq), 
  0.3 us/s, , , , , Interrupt, [2] net tx(softirq) 
You can see above how sabnzbd is the only process on this idle machine that is causing CPU wakeups although it is completely idle.

If you strace the sabnzbd process (including its children processes), you'll see the how active sabnzbd is when doing nothing. Hook into an existing process like this:

strace -v -f -t -s 500 -p <pid>

where <pid> is the process id of the main sabnzbd process.

Re: High wakeups/second count (0.7.16)

Posted: October 5th, 2013, 12:29 pm
by shypike
I see, haven't really looked at this before.
There is a pause_all scheduler event that should stop most activity.
I've put this on the long todo list for now.

Re: High wakeups/second count (0.7.16)

Posted: October 5th, 2013, 12:45 pm
by doctorow
Thanks for looking into this.

Re: High wakeups/second count (0.7.16)

Posted: January 18th, 2014, 7:46 am
by idontknow
This is a known problem with python 2.x: http://bugs.python.org/issue17748

Since locks do not support timeouts, when a thread is asked to wait with a timeout it will just repeatedly sleep() with a maximum time of 50 ms and check the lock manually.

In the PostProcessor thread there is a call to Queue.get() with a timeout:

Code: Select all

nzo = self.queue.get(timeout=3)
which causes the thread to wait with a timeout and hence the high wakeup count.

This is fixed in python 3.x but sabnzbdplus does not support it. In 2.x it can be worked around by not specifying a timeout when calling Queue.get() and doing the timeout in another thread.

Re: High wakeups/second count (0.7.16)

Posted: January 18th, 2014, 8:28 am
by shypike
Python 3 isn't supported because it would require making the code Python 2.7/3.x only.
This would exclude quite a few embedded systems.
(Also it's a lot of work of course.)
I'll look into using your suggestion.
Thanks for the tip!

Re: High wakeups/second count (0.7.16)

Posted: January 23rd, 2014, 4:15 pm
by doctorow
idontknow, thanks for the pointer!!

Re: High wakeups/second count (0.7.16)

Posted: January 23rd, 2014, 4:19 pm
by shypike
I've implemented a fix, well hopefully anyway.