Optimizing SABnzbd

Puzzled · Post by **Puzzled** » May 19th, 2019, 9:17 am

Sorry about the tables, I couldn't find a way to get monospaced fonts.

I have been trying to understand SABnzbd in order to find ways to reduce the CPU consumption and if possible make it run smoother. I haven't done any Python stuff before, but I've done lots of programming in other languages. My setup:
4x core i5
SSD drives
Windows 10
60 Mbit broadband
SABnzbd from the git Py3 branch
yappi for profiling

What I found was that NzbQueue.get_article is using a lot of or almost all the CPU time in all high load cases. First I tried letting a single nzb run for 1 minute with no bandwidth limit. get_article was called 1.74M times and the CPU consumption was 15-20%.

I then added a check for if an article was found for any of the servers in the loop starting at
https://github.com/sabnzbd/sabnzbd/blob ... er.py#L424

If it wasn't I did a time.sleep(0.001) after the loop. With this change get_article was only called 31193 times and the download speed seemingly remained the same. The CPU consumption was reduced to ~3%.

I then loaded up about 350 nzbs totaling 400 GB, let it do it's indexing and stuff, and ran the same test. The cpu usage was now spread over more functions:

Code: Select all

name                                          ncall    tsub      ttot      tavg      
downloader.py:407 Downloader.run              1        0.656250  61.68750  61.68750
nzbqueue.py:661 NzbQueue.get_article          32636    21.45312  56.34375  0.001726
config.py:80 OptionStr.__call__               8656608  15.06250  22.68750  0.000003
nzbstuff.py:74 NzbObject.server_in_try_list   9010168  11.09375  11.09375  0.000001
config.py:84 OptionStr.get                    8656616  7.625000  7.625000  0.000001

See https://github.com/sumerc/yappi/blob/ma ... #yfuncstat for explanation of the data

get_article is called about the same number of times with and without sleep. To avoid too much sleeping it could be changed to only sleep if no article is found several times in a row. It would still lead to a significant reduction in CPU usage with few NZBs.

The OptionStr calls create a lot of overhead with lots of nzbs so I added
propagation_delay = float(cfg.propagation_delay() * 60)
before https://github.com/sabnzbd/sabnzbd/blob ... ue.py#L665 and replaced it in the loop accordingly. Result after:

Code: Select all

name                                          ncall    tsub      ttot      tavg      
downloader.py:407 Downloader.run              1        0.578125  54.48438  54.48438
nzbqueue.py:661 NzbQueue.get_article          81198    26.56250  52.85938  0.000651
nzbstuff.py:74 NzbObject.server_in_try_list   21624185 25.14062  25.14062  0.000001
config.py:80 OptionStr.__call__               81217    0.109375  0.171875  0.000002
config.py:84 OptionStr.get                    81222    0.062500  0.062500  0.000001

Now the OptionStr calls are only using a fraction of the CPU time, leaving more for get_article and server_in_try_list. I assume this could lead to faster download times on really fast connections. I have been unable to reduce the CPU usage, though. The sleep fix does not seem to help. Maybe it's just that the articles in the nzbs I have used were not the same size, forcing it to get new ones more often. I haven't checked.

I am not sure where to go from here. As you can see, those two functions occupy the CPU almost all the time. Because only one thread can run at any time in Python unless it's doing I/O, very little CPU time is available for the other threads. This makes SABnzbd very sluggish if the nzb queue is big. It would be an advantage if it could be sped up further.

I have been thinking about trying to create an article list for each server which the loop would refill regularly. Idle threads could then take the first article in the list. When the queue needs refilling several could be fetched and tested against the try_list in one batch. My hope is that this would make it more efficient than getting one at a time. Unfortunately I don't know enough Python to actually implement it, but I am trying to learn.

Any thoughts about this?

Post by **safihre** » May 19th, 2019, 12:09 pm

Makes me very happy somebody is looking into this with me

Clearly something is wrong that it gets called so often.
It should only loop over available connections within a server: https://github.com/sabnzbd/sabnzbd/blob ... er.py#L454
So maybe these are servers that are not being used yet? There should definitely be some room for improvement here.
Indeed it should only try to get articles if the server is needed and has a free connection.
There used to be a TryList on the NzbQueue longg ago, but it wasn't used. Maybe this can be used for that? So that servers get skipped if they are not used.

In previous CPU-profiling I also found that the Download-thread would go crazy looping, so I introduced the Downloader-slowdown:
https://github.com/sabnzbd/sabnzbd/blob ... #L533-L548
Turned out that simply adding a sleep() behaved very weirdly on VPN's and other weird network-IO, but that's a whole different story! That's why the code seems so overly complex.

I also noticed the call to cfg.propagtion_delay last week, so I made the same change in develop already for 2.3.9

https://github.com/sabnzbd/sabnzbd/comm ... 6991670b37
We could even calculate this when SABnzbd starts as a NzbQueue-property and add a watcher to the setting, that it updates the NzbQueue-property when it's changed.

Puzzled · Post by **Puzzled** » May 19th, 2019, 4:12 pm

I have 11 servers with various different priority levels. When I deactivate all but 1 the number of calls to NzbQueue.get_article is reduced to ~600/minute. The cpu usage is still ~10%, the loop will iterate ~430 times/second, but only 12-13 articles are added every second. Sleeping 0.001 seconds every third time no article is found reduces it to ~3%, or 2% if done every second time.

Is your sleep system there to reduce CPU usage? I wasn't sure, I thought it might be a throttling system. I've only added it to lower cpu usage. It might still be a good idea to let it get up to speed before starting the sleeping, though.

Puzzled · Post by **Puzzled** » May 20th, 2019, 3:38 am

Is the develop branch or the py3 branch going to be the new main branch when you switch to Python 3? I hope you will merge all the fixes done in develop to py3, otherwise it is hard to know if what I'm testing has already been fixed in the other branch. If not it's probably better if I use the develop branch.

Post by **safihre** » May 20th, 2019, 6:51 am

I will merge Py3 into Develop, so far this is the only commit that wasn't merged back to Py3. As you can see I try to merge them from time to time

Post by **safihre** » May 20th, 2019, 1:37 pm

I would suggest to only test with max 3 servers, 11 is a bit out of scope for 95% of regular users

Puzzled · Post by **Puzzled** » May 20th, 2019, 4:52 pm

Only the 2 top priority servers are used at least 99.999% of the time. I think it must be possible to find a way to ignore them when there is nothing to do. Unless all the top priority servers have missed any articles since the last time they checked I don't see why they need to check again. Perhaps some kind of cache of the articles that need to be checked, or even just a simple count of remaining articles for each server/level. If there are aspects of the problem that make this hard because of some reason I haven't thought about then please let me know. I am still trying to understand how it all works.

Puzzled · Post by **Puzzled** » May 21st, 2019, 12:15 pm

I tried adding a test so that a server would only be tested once every second if it had not had any busy_threads or found any articles the last 4 seconds. With 11 servers, 400 GB in the queue and my sleep fix this reduced the number of calls to NzbQueue.get_article from about 78 000/min to about 1150. The load was reduced from 25% (max for one core) to about 4.5%.

Post by **sander** » May 21st, 2019, 2:30 pm

About your findings: does that also work with 2 or 3 servers?

Puzzled · Post by **Puzzled** » May 21st, 2019, 5:11 pm

Using 3 servers with the same priority, a total of 11 connections, downloading 3 different nzbs in the queue (because of retention differences) the load varies between 1.5 and 3.5%, average is probably 2.5. On my setup it's a big improvement no matter what combination of servers and queue size I use. I am more uncertain how it will perform for those have much faster connections or very different hardware. Unfortunately I can't test that myself.

Here is a diff from py3:

Code: Select all

diff --git a/sabnzbd/downloader.py b/sabnzbd/downloader.py
index 584fddab..459aea9c 100644
--- a/sabnzbd/downloader.py
+++ b/sabnzbd/downloader.py
@@ -420,8 +420,20 @@ class Downloader(Thread):
         # Kick BPS-Meter to check quota
         BPSMeter.do.update()

+        no_articles = 0
+        tested_time = {}
+        lastbusy = {}
         while 1:
+            no_articles += 1
             for server in self.servers:
+                serverid = server.id
+                if server.busy_threads:
+                    lastbusy[serverid] = time.time()
+
+                if not (int(lastbusy.get(serverid, 0)) + 4) > int(time.time()) and tested_time.get(serverid, 0) and int(tested_time[serverid]) == int(time.time()):
+                    continue
+
+                tested_time[serverid] = time.time()
                 for nw in server.busy_threads[:]:
                     if (nw.nntp and nw.nntp.error_msg) or (nw.timeout and time.time() > nw.timeout):
                         if nw.nntp and nw.nntp.error_msg:
@@ -470,6 +482,9 @@ class Downloader(Thread):
                     if not article:
                         break

+                    no_articles = 0;
+                    lastbusy[serverid] = time.time()
+
                     if server.retention and article.nzf.nzo.avg_stamp < time.time() - server.retention:
                         # Let's get rid of all the articles for this server at once
                         logging.info('Job %s too old for %s, moving on', article.nzf.nzo.final_name, server.host)
@@ -493,6 +508,9 @@ class Downloader(Thread):
                             logging.error(T('Failed to initialize %s@%s with reason: %s'), nw.thrdnum, server.host, sys.exc_info()[1])
                             self.__reset_nw(nw, "failed to initialize")

+            if no_articles:
+                time.sleep(0.001)

The sleeping can be made less aggressive by doing for instance "if no_articles % 2:". It's the first thing I would try if the performance is reduced.

Post by **safihre** » May 22nd, 2019, 3:26 am

If you could make a pull request at Github that would be great, then we can take a look at it more closely!
https://github.com/sabnzbd/sabnzbd

Puzzled · Post by **Puzzled** » May 24th, 2019, 7:11 am

Some more ideas...

1. I've been looking at the trylist. Have you considered using a dictionary or an array instead? That way you won't have to check if it's already there before setting it, and looking it up would probably be faster. Hopefully it would also require less locking. Instead of a list you would use the server as key and 1 or true as value when it has been tried. If you use numbers as server ids by default instead of strings I assume it would use less memory. Is there any particular reason why you don't do that? From what I understand the trylist is not saved, so it doesn't matter if it changes the next time SABnzbd is started.

2. Why do you need to do decode on all the articles in the "Let's get rid of all the articles for this server at once" (downloader.py:495). I think this part could be faster if it was possible to loop through all the articles of the nzo more or less unconditionally and set them all tried for a particular server using an array or dictionary.

This is a much larger change than the sleep fix so if you have any thoughts about issues or why it won't work I would like to hear them.

Post by **safihre** » May 25th, 2019, 10:09 am

1, on Python 2 dictionaries are slow. Try list is just a wrapper for an array and the lookups "if server in trylist" are very fast operations.

2. This part is only used when the whole post is outside the retention of what the user specified as age limit for that specific server. This is an "Advanced setting" and not really used nowadays anymore since large portions of the newsservers now have 10+ years of retention. So all in all, this part of the code is rarely triggered.

I'm on vacation this week so will be a while before I can think over these changes. We can definitely do something to optimize the Downloader loop. Maybe it needs to be split or handled very differently.

Puzzled · Post by **Puzzled** » May 26th, 2019, 2:55 pm

I'm pretty much finished with modifications that reads several articles at once and uses a queue for each server. I will test it some more, clean up the code a bit and then upload it to github.

Regarding the age limit I think it's quite useful. Only the Omicron/Highwinds related servers have 3800 days retention, and they have started kicking out their resellers. The rest generally have 100 - 1200 days. Also there is the free IPv6 server which has less than 25 days. Anyway, I think I've fixed that performance problem too, although if would be better to skip the decode part if we can.

Post by **safihre** » May 27th, 2019, 1:21 am

We use the decode part because there we have the code responsible for selecting another server to try after a failure, and the code to register articles

Support Forum

Optimizing SABnzbd

Optimizing SABnzbd

Re: Optimizing SABnzbd

Re: Optimizing SABnzbd

Re: Optimizing SABnzbd

Re: Optimizing SABnzbd

Re: Optimizing SABnzbd

Re: Optimizing SABnzbd

Re: Optimizing SABnzbd

Re: Optimizing SABnzbd

Re: Optimizing SABnzbd

Re: Optimizing SABnzbd

Re: Optimizing SABnzbd

Re: Optimizing SABnzbd

Re: Optimizing SABnzbd

Re: Optimizing SABnzbd