PreProcessing script - parses nzb names with regex to assign categories

wadsworth · Post by **wadsworth** » June 28th, 2023, 12:17 am

Figured I'd post up a python script which has helped resolve a nagging issue for me. Hopefully it will be of use to some others out there.

My NZB search provider (just like binsearch) often puts out .nzb files which lack the proper xml (category) tagging to allow fully-automated category sorting in SAB. On occasion they get tagged properly, my SAB categories kicks in and everything is great. However they sometimes don't get sorted properly and I'm left with a folder full of mixed files - movies, tv shows, ebooks, audio files, etc.

This script accomplishes the following:
* Checks if a SAB category is already assigned before continuing. If already assigned, the download continues without any changes.
* Uses (case-insensitive) regular expressions in a specific order (specific search terms > TV > Movies > Audio) to determine category sorting.
* Individual words are evaluated within the nzb name for pattern matches (ie - the "epub" search won't match with "republic")
* TV shows are evaluated for the (s01e08, S2E14) or (season3, Season02) patterns.
* Movies are evaluated next based on resolution (720p, 1080P, 2160p). If not matching a TV show, it's assumed to be a Movie at this point.
* If all of the pattern matching fails, the download continues without any changes.

This script might not be needed if your categories already work flawlessly and/or you only use Radarr or Sonarr to add content. If you ever search manually, or have to contend with unsorted downloads, this will most likely help with your tv/movie sorting.

This example script below is very tailored to my particular needs, but the pattern matching is not overly "fuzzy" and the tv/movie sorting portion should be very universal. Chances are very good that you can run this script as-is to realize some benefits.

Enjoy!

Code: Select all

# SABnzbd PreProcessing script - evaluates un-categorized downloads and assigns categories based on nzb name

import sys
import re  # add regex module

try:
    # Parse the 18 input variables for SABnzbd version >= 4.0.0
    (scriptname, nzbname, postprocflags, category, script, prio, downloadsize, grouplist, showname, season, episodenumber, episodename, is_proper, resolution, decade, year, month, day, job_type) = sys.argv
except ValueError:
    # ...or 11 variables for earlier versions
    (scriptname, nzbname, postprocflags, category, script, prio, downloadsize, grouplist, showname, season, episodenumber, episodename) = sys.argv
except Exception:
    sys.exit(1)  # a non-zero exit status causes SABnzbd to ignore the output of this script

# Assign nzb name to string
string = (nzbname)

# The example rules below follows this basic format:
# 1 - Specific searches - Specific search terms which reliably indicate a category (ie - motogp, or epub)
# 2 - TV search - TV shows have a reliable naming convention (ie - S01E04, or Season2), so searching these first
# 3 - Movie search - likely a movie if the above rules don't apply and it's a video file (ie - 720p, 1080p, 2160p)
# 4 - Audio search - If the above rules don't apply and it has "FLAC" in it's name, it's likely an audio download
#     ^^^ Listed last since movie and tv releases sometimes include the word "FLAC" in their release names
# 5 - If none of the above rules apply, allow the download to continue without any changes
#
# Summary of return parameter options:
#    print('') # 0 - Refuse, 1 - Accept, 3 - Accept but Fail
#    print('') # Cleaned version of job name (no path or .nzb)
#    print('') # 0 = Download, 1 = +Repair, 2 = +Unpack, 3 = +Delete
#    print('') # Category
#    print('') # Script (no path)
#    print('') # Priority -- -100 = Default, -3 = Duplicate, -2 = Paused, -1 = Low, 0 = Normal, 1 = High, 2 = Force
#    print('') # Group

if (category) is '':  # Verify a SAB category is not assigned before continuing

    # Searches for "motogp"
    if re.search(r"\b\motogp\b", string, re.IGNORECASE):
        category = 'motogp'
        print('')
        print('')
        print('')
        print(category)
        print('')
        print('')
        print('')

    # Searches for "epub"
    elif re.search(r"\b\epub\b", string, re.IGNORECASE):
        category = 'ebook'
        print('')
        print('')
        print('')
        print(category)
        print('')
        print('')
        print('')

    # TV search - matches s01e04, S2E11, Season2, season03, etc
    elif re.compile(r"(.*?)[.\s][sS](\d{1,2})[eE](\d{1,3}).*|(.*?)\b(SEASON|season)(\d{1,3}).*").match(string):
        category = 'tv'
        print('')
        print('')
        print('')
        print(category)
        print('')
        print('')
        print('')

    # Movie search - matches 720p, 1080p, 2160P, etc.  Assumed to be a movie at this point, if not matching the previous TV pattern.
    elif re.search(r"\b\d{3,4}[pP]\b", string, re.IGNORECASE):
        category = 'movie'
        print('')
        print('')
        print('')
        print(category)
        print('')
        print('')
        print('')

    # Post TV/movie search, for "flac"
    elif re.search(r"\b\FLAC\b", string, re.IGNORECASE):
        category = 'audio'
        print('')
        print('')
        print('')
        print(category)
        print('')
        print('')
        print('')

    # No pattern matches were found, allow the download to continue without assigning a category
    else:
        print('1')
        print('')
        print('')
        print('')
        print('')
        print('')
        print('')

# SAB category is already assigned, allow download to continue without any changes
else:
    print('1')
    print('')
    print('')
    print('')
    print('')
    print('')
    print('')

sys.exit(0)

UPDATE - Here's a simplified version which only detects/sorts the TV and Movie categories.
The regex for TV shows has also been updated to include the somewhat rare 3x01 format.
Refer to the original script if you need to sort additional categories besides just TV and movies.

Code: Select all

# SABnzbd PreProcessing script - evaluates un-categorized downloads and assigns to TV or Movie categories based on nzb name

import sys
import re  # add regex module

try:
    # Parse the 18 input variables for SABnzbd version >= 4.0.0
    (scriptname, nzbname, postprocflags, category, script, prio, downloadsize, grouplist, showname, season, episodenumber, episodename, is_proper, resolution, decade, year, month, day, job_type) = sys.argv
except ValueError:
    # ...or 11 variables for earlier versions
    (scriptname, nzbname, postprocflags, category, script, prio, downloadsize, grouplist, showname, season, episodenumber, episodename) = sys.argv
except Exception:
    sys.exit(1)  # a non-zero exit status causes SABnzbd to ignore the output of this script

# Assign nzb name to string
string = (nzbname)

# Summary of return parameter options:
#    print('') # 0 - Refuse, 1 - Accept, 3 - Accept but Fail
#    print('') # Cleaned version of job name (no path or .nzb)
#    print('') # 0 = Download, 1 = +Repair, 2 = +Unpack, 3 = +Delete
#    print('') # Category
#    print('') # Script (no path)
#    print('') # Priority -- -100 = Default, -3 = Duplicate, -2 = Paused, -1 = Low, 0 = Normal, 1 = High, 2 = Force
#    print('') # Group

if (category) is '':  # Verify a SAB category is not assigned before continuing

    # TV search - matches s01e04, S2E11, Season2, season03, 2x1, 3x03, etc
    if re.compile(r"(.*?)[.\s][sS](\d{1,2})[eE](\d{1,3}).*|(.*?)\b(SEASON|season)(\d{1,3}).*|(.*?)(\d{1,2})[xX](\d{1,3}).*").match(string):
        category = 'tv'
        print('')
        print('')
        print('')
        print(category)
        print('')
        print('')
        print('')

    # Movie search - matches 720p, 1080p, 2160P, etc
    elif re.search(r"\b\d{3,4}[pP]\b", string, re.IGNORECASE):
        category = 'movie'
        print('')
        print('')
        print('')
        print(category)
        print('')
        print('')
        print('')

    # No pattern matches were found, allow the download to continue without assigning a category
    else:
        print('1')
        print('')
        print('')
        print('')
        print('')
        print('')
        print('')

# SAB category is already assigned, allow download to continue without any changes
else:
    print('1')
    print('')
    print('')
    print('')
    print('')
    print('')
    print('')

sys.exit(0)

Post by **safihre** » June 28th, 2023, 8:27 am

Nice and simple, and can be useful for others

buzzword · Post by **buzzword** » November 14th, 2023, 7:07 pm

Hi, maybe you can help me with something…I would like to be able to have a preprocessing script that can recognize when I’ve already downloaded a file at the same or higher resolution so I can flag as ‘don’t accept’ as well ignoring any characters in the file name appearing after the resolution.

So if try to download any of the following 3 files for example
blah.blah.blah.blah.2160p.XYZ.nzb or
blah.blah.blah.blah.1080p.ABC.nzb or
blah.blah.blah.blah.720p.GETIT.nzb

and my nzb history folder (containing gzip versions of all previously imported nzb files) contains a file named
blah.blah.blah.blah.2160p.ABC.gz
then I want to reject the nzb file i’m trying to download.

I have some very light C# and VB experience (written a few small filename cleanup utilities previously for home use a few years ago) but no python experience, I get how to set the return codes etc from examples but don’t know how or if it’s even possible to do the condition check I want in python.

Can you point me in the right direction, is what I want easily doable?

Thanks!

Post by **safihre** » November 15th, 2023, 1:16 am

I think you are much better of using Sonarr and Radarr, they offer this exact functionality and the ability to fine grain detail anything you want.

Post by **sander** » December 1st, 2023, 5:25 am

Julian758 wrote: ↑December 1st, 2023, 2:48 am Hi Guys,
Hey there! I'm looking to create a preprocessing script that can detect if I've previously downloaded a file of the same or better resolution. This script should also disregard any characters in the filename that appear after the resolution. The aim is to mark such files as 'do not accept'. Can you help with this?

Maybe show what you've created so far? And the functionality that's already working?

And: aren't Sonarr/Radarr/*arr already doing this? If so, that could safe you work.

Support Forum

PreProcessing script - parses nzb names with regex to assign categories

PreProcessing script - parses nzb names with regex to assign categories

Re: PreProcessing script - parses nzb names with regex to assign categories

Re: PreProcessing script - parses nzb names with regex to assign categories

Re: PreProcessing script - parses nzb names with regex to assign categories

Re: PreProcessing script - parses nzb names with regex to assign categories