Help with mkv subtitle extraction script please

Come up with a useful post-processing script? Share it here!
Post Reply
haider254
Newbie
Newbie
Posts: 2
Joined: January 10th, 2014, 2:06 pm

Help with mkv subtitle extraction script please

Post by haider254 »

Hi Folks!

I have a basic sab/sickbeard/couchpotato setup running on ubuntu server 12.04

I use my Samsung smart tv with plex to watch them. The Samsung plex app can't display mkv subs unless you transcode them and I don't like doing that because I'm running my system on an aged computer. I ran across this script which I think is the answer to my sorrows:

Code: Select all

#!/bin/bash
# Extract subtitles from each MKV file in the given directory

# If no directory is given, work in local dir
if [ "$1" = "" ]; then
  DIR="."
else
  DIR="$1"
fi

# Get all the MKV files in this dir and its subdirs
find "$DIR" -type f -name '*.mkv' | while read filename
do
  # Find out which tracks contain the subtitles
  mkvmerge -i "$filename" | grep 'subtitles' | while read subline
  do
    # Grep the number of the subtitle track
    tracknumber=`echo $subline | egrep -o "[0-9]{1,2}" | head -1`

    # Get base name for subtitle
    subtitlename=${filename%.*}

    # Extract the track to a .tmp file
    `mkvextract tracks "$filename" $tracknumber:"$subtitlename.srt.tmp" > /dev/null 2>&1`
    `chmod g+rw "$subtitlename.srt.tmp"`

    # Do a super-primitive language guess: DUTCH
    langtest=`egrep -ic ' ik | je | een ' "$subtitlename".srt.tmp`
    trimregex="vertaling &\|vertaling:\|vertaald door\|bierdopje"

    # Do a super-primitive language guess: ENGLISH
    #langtest=`egrep -ic ' you | to | the ' "$subtitlename".srt.tmp`
    #trimregex=""

    # Do a super-primitive language guess: GERMAN
    #langtest=`egrep -ic ' ich | ist | sie ' "$subtitlename".srt.tmp`
    #trimregex=""

    # Do a super-primitive language guess: SPANISH
    #langtest=`egrep -ic ' el | es | por ' "$subtitlename".srt.tmp`
    #trimregex=""

    # Check if subtitle passes our language filter (10 or more matches)
    if [ $langtest -ge 10 ]; then
      # Regex to remove credits at the end of subtitles (read my reason why!)
      `sed 's/\r//g' < "$subtitlename.srt.tmp" \
        | sed 's/%/%%/g' \
        | awk '{if (a){printf("\t")};printf $0; a=1; } /^$/{print ""; a=0;}' \
        | grep -iv "$trimregex" \
        | sed 's/\t/\r\n/g' > "$subtitlename.srt"`
      `rm "$subtitlename.srt.tmp"`
      `chmod g+rw "$subtitlename.srt"`
    else
      # Not our desired language: add a number to the filename and keep anyway, just in case
      `mv "$subtitlename.srt.tmp" "$subtitlename.$tracknumber.srt" > /dev/null 2>&1`
    fi
  done
done
What I need now is some help on how to integrate this into sab! The coder of the script says he uses it through sab but didn't elaborate on how he achived this! If anyone can help I would be very much obliged!

I have installed mkvtoolnix and have put the script in a folder and have set the Post-Processing script folder in sab to the correct directory. That's where my mad skillz end.

Thanks in advance!

And just want to give props to Computer Nerd From Hell for providing this script to the online world!

EDIT: I would also have a another request! I would like the file name to be *.en.srt or *.fi.srt (for finnish scripts if found, not necessary for fin scripts, but if it's easy to implement then I would be very much obliged)

I assume correcting the few lines of code as follows would take care of the *.en.srt issue.

| sed 's/\t/\r\n/g' > "$subtitlename.en.srt"`
`rm "$subtitlename.srt.tmp"`
`chmod g+rw "$subtitlename.en.srt"

EDIT2: Upon a little playing around I found that under categories I have to activate the script for said categories. Will run a test download to see if this works now.

I ran the script manually and found that it seems to prefer dutch language subtitles. If anyone can help with this I would be gratefull.

EDIT 3: I modified the script slightly for the test to see how it performs as follows:

Code: Select all

#!/bin/bash
# Extract subtitles from each MKV file in the given directory

# If no directory is given, work in local dir
if [ "$1" = "" ]; then
  DIR="."
else
  DIR="$1"
fi

# Get all the MKV files in this dir and its subdirs
find "$DIR" -type f -name '*.mkv' | while read filename
do
  # Find out which tracks contain the subtitles
  mkvmerge -i "$filename" | grep 'subtitles' | while read subline
  do
        # Grep the number of the subtitle track
        tracknumber=`echo $subline | egrep -o "[0-9]{1,2}" | head -1`

        # Get base name for subtitle
        subtitlename=${filename%.*}

        # Extract the track to a .tmp file
        `mkvextract tracks "$filename" $tracknumber:"$subtitlename.srt.tmp" > /$
        `chmod g+rw "$subtitlename.srt.tmp"`

        # Do a super-primitive language guess: DUTCH
        #langtest=`egrep -ic ' ik | je | een ' "$subtitlename".srt.tmp`
        #trimregex="vertaling &\|vertaling:\|vertaald door\|bierdopje"

       [b] # Do a super-primitive language guess: ENGLISH
        langtest=`egrep -ic ' you | to | the ' "$subtitlename".en.srt.tmp`
        trimregex=""

        # Do a super-primitive language guess: FINNISH
        #langtest=`egrep -ic ' tämä | hän | kyllä ' "$subtitlename".fi.srt.tmp`
        #trimregex=""[/b]

        # Do a super-primitive language guess: GERMAN
        #langtest=`egrep -ic ' ich | ist | sie ' "$subtitlename".srt.tmp`
        #trimregex=""

        # Do a super-primitive language guess: SPANISH
        #langtest=`egrep -ic ' el | es | por ' "$subtitlename".srt.tmp`
        #trimregex=""

        # Check if subtitle passes our language filter (10 or more matches)
        if [ $langtest -ge 10 ]; then
          # Regex to remove credits at the end of subtitles (read my reason why$
          `sed 's/\r//g' < "$subtitlename.srt.tmp" \
                | sed 's/%/%%/g' \
                | awk '{if (a){printf("\t")};printf $0; a=1; } /^$/{print ""; a$
                | grep -iv "$trimregex" \
                | sed 's/\t/\r\n/g' > "$subtitlename[b].en.s[/b]rt"`
          `rm "$subtitlename.srt.tmp"`
          `chmod g+rw "$subtitlenam[b]e.en.[/b]srt"`
        else
          # Not our desired language: add a number to the filename and keep any$
          `mv "$subtitlename.srt.tmp" "$subtitlename.$tracknumber.srt" > /dev/n$
        fi
  done
done
I hope this works. All be it rare, sometimes finnish subs do appear and I would love to get that to work. I looked up the most common finnish words and added them to the test. I think that line will function but how do I get it to rename with the .fi. prefix?

I can now get it running in sab and it seems to work partly. It named the srt file:
RANDOMMEDIATHINGY.3.srt and gave the following errors:

Code: Select all

  chmod: changing permissions of `/media/plex/1TB/Downloads/Movies/Lorem ipsum dolor sit amet.cp(tt0903624)/orem ipsum dolor sit amet.srt.tmp': Operation not permitted
egrep: /media/plex/1TB/Downloads/Movies/orem ipsum dolor sit amet.cp(tt0903624)/orem ipsum dolor sit amet.en.srt.tmp: No such file or directory
/media/plex/1TB/Downloads/scripts/ripsubtitles.sh: line 48: [: -ge: unary operator expected
chmod: changing permissions of `/media/plex/1TB/Downloads/Movies/orem ipsum dolor sit amet.cp(tt0903624)/orem ipsum dolor sit amet.srt.tmp': Operation not permitted
egrep: /media/plex/1TB/Downloads/Movies/orem ipsum dolor sit amet.cp(tt0903624)/orem ipsum dolor sit amet.en.srt.tmp: No such file or directory
/media/plex/1TB/Downloads/scripts/ripsubtitles.sh: line 48: [: -ge: unary operator expected
My post is getting beyond confusing, just to clarify:
-If someone can help with the error it is spewing above
-If anyone can help with the filter to just filter out english and finnish subs and name them *.en.srt / *.fi.srt

I'm puzzled.. will get back to it later I think...
User avatar
jcfp
Release Testers
Release Testers
Posts: 993
Joined: February 7th, 2008, 12:45 pm

Re: Help with mkv subtitle extraction script please

Post by jcfp »

haider254 wrote:-If someone can help with the error it is spewing above
-If anyone can help with the filter to just filter out english and finnish subs and name them *.en.srt / *.fi.srt
The error is the simple part: that .srt.tmp file is missing. Your script assumes it is created and when it's not failures stack up (egrep fails, var langtest isn't set and thus the "-ge 10" test runs into syntax error).

The actual problem meanwhile remains invisible as output (including that of mkvextract) is sent to /dev/null. The code of the last script in your post appears cut off at the line endings, and has lots of command inbetween backticks for no apparent reason.
haider254
Newbie
Newbie
Posts: 2
Joined: January 10th, 2014, 2:06 pm

Re: Help with mkv subtitle extraction script please

Post by haider254 »

jcfp wrote:
haider254 wrote:-If someone can help with the error it is spewing above
-If anyone can help with the filter to just filter out english and finnish subs and name them *.en.srt / *.fi.srt
The error is the simple part: that .srt.tmp file is missing. Your script assumes it is created and when it's not failures stack up (egrep fails, var langtest isn't set and thus the "-ge 10" test runs into syntax error).

The actual problem meanwhile remains invisible as output (including that of mkvextract) is sent to /dev/null. The code of the last script in your post appears cut off at the line endings, and has lots of command inbetween backticks for no apparent reason.
Thanks for the advice. I kind of gave up on trying to get it to run and extract both Finnish and English files. I don't really need the Finnish ones if the English ones are always in sync, which they should be if they're ripped from an MKV track. I kind of only use them as a backup. I decided to just run the original script and comment out the dutch and active the english. however, I get the following error.

Code: Select all

chmod: changing permissions of `/media/plex/1TB/Films/Once Upon a Time in the West (1968) 720p/Once Upon a Time in the West1968.srt.tmp': Operation not permitted
chmod: changing permissions of `/media/plex/1TB/Films/Once Upon a Time in the West (1968) 720p/Once Upon a Time in the West1968.srt': Operation not permitted
Any idea how to get around this?

Also, the reason for the cutting out of the last few minutes of subtitles is that dutch subtitlers have a bad habit of giving shoutouts to their comrads on their work... I am such a noob that I have no idea how to get rid of that, if you can help out with that I'd be very much obliged!

EDIT:
I ran the script with sudo and it didn't give the error anymore. Is there anyway to run it as sudo under sab?

One other more problematic thing. When I did run the script I also didn't get any content in my srt file... hmm... I would really love to get this working but I don't think I have the means to make it work..
boredazfcuk
Newbie
Newbie
Posts: 12
Joined: June 18th, 2014, 4:29 am

Re: Help with mkv subtitle extraction script please

Post by boredazfcuk »

Hi,

I have been looking at the same thing as you have (I know this is a little late but it could help others so I'm posting).

I took the same source script as you did but I made a number of modifications so that it looks up the language of each embedded subtitle rather than trying to guess it. I also modified it so that it searches for the format of the subtitle and extracts each one accordingly .srt to .srt files, .sup to .sup files, .sub and .idx to .sub and .idx. etc. I also made it check to see if a subtitle already exists during the looping process, this is because I found some movies have 2 full English .srt subtitles. This will give them different names. If it hits 3 it will overwrite one unfortunately. I can't be bothered to code it to check for more but maybe someone else while.

Here's my script, it lives in /usr/bin/ripsubtitles.sh

Code: Select all

#!/bin/bash
# Extract subtitles from each MKV file in the given directory

# Set language of subtitles to find
lang=eng
shortlang=en

# If no directory is given, work in local dir
if [ "$1" = "" ]; then
	DIR="."
else
	DIR="$1"
fi

# Get all the MKV files in this dir and its subdirs
find "$DIR" -type f -name '*.mkv' | sort -n | while read filename

do
	subtitlename=${filename%.*}
	# Test if an english subtitle exist
	if [ -e "$subtitlename"."$lang".srt ] || [ -e "$subtitlename"."$lang".ass ] || [ -e "$subtitlename"."$lang".ssa ] || [ -e "$subtitlename"."$lang".usf ] || [ -e "$subtitlename"."$lang".sup ] || [ -e "$subtitlename"."$lang".sub ] ;then
		echo "$filename already has a full or forced subtitle"
	else
		echo "No subtitles found for $filename"
                # Number of matching embedded subtitle tracks
                subnum=`mkvmerge --identify-verbose "$filename" | grep subtitles | grep language:"$lang" | grep -v commentary | grep -v Commentary | wc -l`
                echo "Number of matching subtitles is $subnum"
		# Find out which tracks contain the subtitles
		echo "Checking $filename for $lang subtitle tracks"
		mkvmerge --identify-verbose "$filename" | grep subtitles | grep language:"$lang" | grep -v commentary | grep -v Commentary | while read subline
		do
			# Grep the number of the subtitle track
			tracknumber=`echo $subline | egrep -o "[0-9]{1,2}" | head -1`

			# Grep the subtitle format
			format=`echo $subline | grep -oP '(?<=codec_id:).+?(?= )'`

			case $format in
				S_TEXT/UTF8)
					format=".srt"
					;;
				S_TEXT/SSA)
					format=".ssa"
					;;
				S_TEXT/ASS)
					format=".ass"
					;;
				S_TEXT/USF)
					format=".usf"
					;;
				S_VOBSUB)
					format=""
					;;
				S_HDMV/PGS)
					format=".sup"
					;;
				*)
			esac

			# Get base name for subtitle
			subtitlename=${filename%.*}

			# Extract the track to a .tmp file
			echo "Extracting subtitle track to temporary file for $filename"
			`mkvextract tracks "$filename" $tracknumber:"$subtitlename$format.tmp" > /dev/null 2>&1`
                        if [ -f "$subtitlename$format.tmp" ]; then
				`chmod g+rw "$subtitlename$format.tmp"`
			fi
			if [ -f "$subtitlename$format.tmp" ]; then
				# Check the file size for know if it s a forced or a real subtitle
		 		size_log=$(du "$subtitlename$format.tmp" | cut -f1)
				echo "Subtitle Size:" $size_log
				if [ format=".srt" ] || [ format=".ssa" ] || [ format=".ass" ] || [ format=".usf" ]; then
					if [ $size_log -lt 10 ]; then
						# rename forced subtitle for plex
						echo "Subtitle track appears to be forced, renaming as forced$format"
						`mv "$subtitlename$format.tmp" "$subtitlename.forced$format" > /dev/null 2>&1`
					else
						# Rename in .$lang.$format for plex
						echo "Subtitle track appears to be full, renaming as $lang$format"
						`mv "$subtitlename$format.tmp" "$subtitlename.$lang$format" > /dev/null 2>&1`
					fi
				else
					if [ $size_log -lt 1000 ] ; then
						# rename forced subtitle for plex
						echo "Subtitle track appears to be forced, renaming as forced$format"
						`mv "$subtitlename$format.tmp" "$subtitlename.forced$format" > /dev/null 2>&1`
					else
						if [ -f "$subtitlename.$lang$format"]; then
							 # Rename in .$lang.$format for plex
							echo "Subtitle track appears to be full, renaming as $shortlang$format"
							`mv "$subtitlename$format.tmp" "$subtitlename.$shortlang$format" > /dev/null 2>&1`
						else
							# Rename in .$lang.$format for plex
							echo "Subtitle track appears to be full, renaming as $lang$format"
							`mv "$subtitlename$format.tmp" "$subtitlename.$lang$format" > /dev/null 2>&1`
						fi
					fi
				fi
			fi
			if [ -f "$subtitlename.sub" ]; then
				# Check the file size for know if it s a forced or a real subtitle
		 		size_log=$(du "$subtitlename.sub" | cut -f1)
				echo "Subtitle Size:" $size_log
				if [ $size_log -lt 1000 ] ; then
					# rename forced subtitle for plex
					echo "Subtitle track appears to be forced, renaming as forced$format"
					`mv "$subtitlename.sub" "$subtitlename.forced.sub" > /dev/null 2>&1`
					`mv "$subtitlename.idx" "$subtitlename.forced.idx" > /dev/null 2>&1`
				else
					if [ -f "$subtitlename.$lang.sub"]; then
	                                        # Rename in .$lang$format for plex
        	                                echo "Subtitle track appears to be full, renaming as $lang$format"
                	                        `mv "$subtitlename.sub" "$subtitlename.$shortlang.sub" > /dev/null 2>&1`
                        	                `mv "$subtitlename.idx" "$subtitlename.$shortlang.idx" > /dev/null 2>&1`
					else
						# Rename in .$lang$format for plex
						echo "Subtitle track appears to be full, renaming as $lang$format"
						`mv "$subtitlename.sub" "$subtitlename.$lang.sub" > /dev/null 2>&1`
						`mv "$subtitlename.idx" "$subtitlename.$lang.idx" > /dev/null 2>&1`
					fi
				fi
			fi

		done
	fi
done
Then I have a ripdownloadssubs.sh in my scripts folder which contains:

Code: Select all

#!/bin/bash

"/usr/bin/ripsubtitles.sh" "/mnt/HD0/Downloads/Complete/Movies"
This lets sab extract the subs for the download after it completes.
Post Reply