PreProcessing script - parses nzb names with regex to assign categories

Come up with a useful post-processing script? Share it here!
Post Reply
wadsworth
Newbie
Newbie
Posts: 3
Joined: June 27th, 2023, 10:21 pm

PreProcessing script - parses nzb names with regex to assign categories

Post by wadsworth »

Figured I'd post up a python script which has helped resolve a nagging issue for me. Hopefully it will be of use to some others out there.

My NZB search provider (just like binsearch) often puts out .nzb files which lack the proper xml (category) tagging to allow fully-automated category sorting in SAB. On occasion they get tagged properly, my SAB categories kicks in and everything is great. However they sometimes don't get sorted properly and I'm left with a folder full of mixed files - movies, tv shows, ebooks, audio files, etc.

This script accomplishes the following:
* Checks if a SAB category is already assigned before continuing. If already assigned, the download continues without any changes.
* Uses (case-insensitive) regular expressions in a specific order (specific search terms > TV > Movies > Audio) to determine category sorting.
* Individual words are evaluated within the nzb name for pattern matches (ie - the "epub" search won't match with "republic")
* TV shows are evaluated for the (s01e08, S2E14) or (season3, Season02) patterns.
* Movies are evaluated next based on resolution (720p, 1080P, 2160p). If not matching a TV show, it's assumed to be a Movie at this point.
* If all of the pattern matching fails, the download continues without any changes.

This script might not be needed if your categories already work flawlessly and/or you only use Radarr or Sonarr to add content. If you ever search manually, or have to contend with unsorted downloads, this will most likely help with your tv/movie sorting.

This example script below is very tailored to my particular needs, but the pattern matching is not overly "fuzzy" and the tv/movie sorting portion should be very universal. Chances are very good that you can run this script as-is to realize some benefits.

O0 Enjoy!

Code: Select all

# SABnzbd PreProcessing script - evaluates un-categorized downloads and assigns categories based on nzb name

import sys
import re  # add regex module

try:
    # Parse the 18 input variables for SABnzbd version >= 4.0.0
    (scriptname, nzbname, postprocflags, category, script, prio, downloadsize, grouplist, showname, season, episodenumber, episodename, is_proper, resolution, decade, year, month, day, job_type) = sys.argv
except ValueError:
    # ...or 11 variables for earlier versions
    (scriptname, nzbname, postprocflags, category, script, prio, downloadsize, grouplist, showname, season, episodenumber, episodename) = sys.argv
except Exception:
    sys.exit(1)  # a non-zero exit status causes SABnzbd to ignore the output of this script

# Assign nzb name to string
string = (nzbname)

# The example rules below follows this basic format:
# 1 - Specific searches - Specific search terms which reliably indicate a category (ie - motogp, or epub)
# 2 - TV search - TV shows have a reliable naming convention (ie - S01E04, or Season2), so searching these first
# 3 - Movie search - likely a movie if the above rules don't apply and it's a video file (ie - 720p, 1080p, 2160p)
# 4 - Audio search - If the above rules don't apply and it has "FLAC" in it's name, it's likely an audio download
#     ^^^ Listed last since movie and tv releases sometimes include the word "FLAC" in their release names
# 5 - If none of the above rules apply, allow the download to continue without any changes
#
# Summary of return parameter options:
#    print('') # 0 - Refuse, 1 - Accept, 3 - Accept but Fail
#    print('') # Cleaned version of job name (no path or .nzb)
#    print('') # 0 = Download, 1 = +Repair, 2 = +Unpack, 3 = +Delete
#    print('') # Category
#    print('') # Script (no path)
#    print('') # Priority -- -100 = Default, -3 = Duplicate, -2 = Paused, -1 = Low, 0 = Normal, 1 = High, 2 = Force
#    print('') # Group

if (category) is '':  # Verify a SAB category is not assigned before continuing

    # Searches for "motogp"
    if re.search(r"\b\motogp\b", string, re.IGNORECASE):
        category = 'motogp'
        print('')
        print('')
        print('')
        print(category)
        print('')
        print('')
        print('')

    # Searches for "epub"
    elif re.search(r"\b\epub\b", string, re.IGNORECASE):
        category = 'ebook'
        print('')
        print('')
        print('')
        print(category)
        print('')
        print('')
        print('')

    # TV search - matches s01e04, S2E11, Season2, season03, etc
    elif re.compile(r"(.*?)[.\s][sS](\d{1,2})[eE](\d{1,3}).*|(.*?)\b(SEASON|season)(\d{1,3}).*").match(string):
        category = 'tv'
        print('')
        print('')
        print('')
        print(category)
        print('')
        print('')
        print('')

    # Movie search - matches 720p, 1080p, 2160P, etc.  Assumed to be a movie at this point, if not matching the previous TV pattern.
    elif re.search(r"\b\d{3,4}[pP]\b", string, re.IGNORECASE):
        category = 'movie'
        print('')
        print('')
        print('')
        print(category)
        print('')
        print('')
        print('')

    # Post TV/movie search, for "flac"
    elif re.search(r"\b\FLAC\b", string, re.IGNORECASE):
        category = 'audio'
        print('')
        print('')
        print('')
        print(category)
        print('')
        print('')
        print('')

    # No pattern matches were found, allow the download to continue without assigning a category
    else:
        print('1')
        print('')
        print('')
        print('')
        print('')
        print('')
        print('')

# SAB category is already assigned, allow download to continue without any changes
else:
    print('1')
    print('')
    print('')
    print('')
    print('')
    print('')
    print('')

sys.exit(0)

UPDATE - Here's a simplified version which only detects/sorts the TV and Movie categories.
The regex for TV shows has also been updated to include the somewhat rare 3x01 format.
Refer to the original script if you need to sort additional categories besides just TV and movies.

Code: Select all

# SABnzbd PreProcessing script - evaluates un-categorized downloads and assigns to TV or Movie categories based on nzb name

import sys
import re  # add regex module

try:
    # Parse the 18 input variables for SABnzbd version >= 4.0.0
    (scriptname, nzbname, postprocflags, category, script, prio, downloadsize, grouplist, showname, season, episodenumber, episodename, is_proper, resolution, decade, year, month, day, job_type) = sys.argv
except ValueError:
    # ...or 11 variables for earlier versions
    (scriptname, nzbname, postprocflags, category, script, prio, downloadsize, grouplist, showname, season, episodenumber, episodename) = sys.argv
except Exception:
    sys.exit(1)  # a non-zero exit status causes SABnzbd to ignore the output of this script

# Assign nzb name to string
string = (nzbname)

# Summary of return parameter options:
#    print('') # 0 - Refuse, 1 - Accept, 3 - Accept but Fail
#    print('') # Cleaned version of job name (no path or .nzb)
#    print('') # 0 = Download, 1 = +Repair, 2 = +Unpack, 3 = +Delete
#    print('') # Category
#    print('') # Script (no path)
#    print('') # Priority -- -100 = Default, -3 = Duplicate, -2 = Paused, -1 = Low, 0 = Normal, 1 = High, 2 = Force
#    print('') # Group

if (category) is '':  # Verify a SAB category is not assigned before continuing

    # TV search - matches s01e04, S2E11, Season2, season03, 2x1, 3x03, etc
    if re.compile(r"(.*?)[.\s][sS](\d{1,2})[eE](\d{1,3}).*|(.*?)\b(SEASON|season)(\d{1,3}).*|(.*?)(\d{1,2})[xX](\d{1,3}).*").match(string):
        category = 'tv'
        print('')
        print('')
        print('')
        print(category)
        print('')
        print('')
        print('')

    # Movie search - matches 720p, 1080p, 2160P, etc
    elif re.search(r"\b\d{3,4}[pP]\b", string, re.IGNORECASE):
        category = 'movie'
        print('')
        print('')
        print('')
        print(category)
        print('')
        print('')
        print('')

    # No pattern matches were found, allow the download to continue without assigning a category
    else:
        print('1')
        print('')
        print('')
        print('')
        print('')
        print('')
        print('')

# SAB category is already assigned, allow download to continue without any changes
else:
    print('1')
    print('')
    print('')
    print('')
    print('')
    print('')
    print('')

sys.exit(0)
Last edited by wadsworth on June 28th, 2023, 11:04 am, edited 2 times in total.
User avatar
safihre
Administrator
Administrator
Posts: 5513
Joined: April 30th, 2015, 7:35 am
Contact:

Re: PreProcessing script - parses nzb names with regex to assign categories

Post by safihre »

Nice and simple, and can be useful for others 👍
If you like our support, check our special newsserver deal or donate at: https://sabnzbd.org/donate
buzzword
Newbie
Newbie
Posts: 27
Joined: March 21st, 2022, 11:16 am

Re: PreProcessing script - parses nzb names with regex to assign categories

Post by buzzword »

Hi, maybe you can help me with something…I would like to be able to have a preprocessing script that can recognize when I’ve already downloaded a file at the same or higher resolution so I can flag as ‘don’t accept’ as well ignoring any characters in the file name appearing after the resolution.

So if try to download any of the following 3 files for example
blah.blah.blah.blah.2160p.XYZ.nzb or
blah.blah.blah.blah.1080p.ABC.nzb or
blah.blah.blah.blah.720p.GETIT.nzb

and my nzb history folder (containing gzip versions of all previously imported nzb files) contains a file named
blah.blah.blah.blah.2160p.ABC.gz
then I want to reject the nzb file i’m trying to download.

I have some very light C# and VB experience (written a few small filename cleanup utilities previously for home use a few years ago) but no python experience, I get how to set the return codes etc from examples but don’t know how or if it’s even possible to do the condition check I want in python.

Can you point me in the right direction, is what I want easily doable?

Thanks!
User avatar
safihre
Administrator
Administrator
Posts: 5513
Joined: April 30th, 2015, 7:35 am
Contact:

Re: PreProcessing script - parses nzb names with regex to assign categories

Post by safihre »

I think you are much better of using Sonarr and Radarr, they offer this exact functionality and the ability to fine grain detail anything you want.
If you like our support, check our special newsserver deal or donate at: https://sabnzbd.org/donate
User avatar
sander
Release Testers
Release Testers
Posts: 9038
Joined: January 22nd, 2008, 2:22 pm

Re: PreProcessing script - parses nzb names with regex to assign categories

Post by sander »

Julian758 wrote: December 1st, 2023, 2:48 am Hi Guys,
Hey there! I'm looking to create a preprocessing script that can detect if I've previously downloaded a file of the same or better resolution. This script should also disregard any characters in the filename that appear after the resolution. The aim is to mark such files as 'do not accept'. Can you help with this?
Maybe show what you've created so far? And the functionality that's already working?

And: aren't Sonarr/Radarr/*arr already doing this? If so, that could safe you work.
Post Reply