Page 1 of 1

Simple Python IMDB Check Help

Posted: February 5th, 2010, 8:55 pm
by pilGrim
Hi All,

Well this should be simple but I am a complete noob in python.  I am getting stuck in the following situation:

I set a variable:  IMDB_Votes = 1,000

I pull out the current votes from the IMDB HTML Page: imdbwebvotes = re.search(">([0-9]*,*[0-9][0-9][0-9]) votes= imdb_vote):


I think I have defined the imdbwebvotes variable definition is wrong as when the number is high xxx,xxx I get an invalid literal for int() with base 10: '177,362' error.

The code does work when the variable is low x,xxx 

Can someone point me in the right direction on how to make this work.  Is it the code that turns the variable into an integer or is it the actual creating of the imdbwebvotes variable.

Many thanks in advance!!

pilgrim

Re: Simple Python IMDB Check Help

Posted: February 6th, 2010, 5:10 am
by Camelot
I would think that the regex should be:
">([0-9]+,?[0-9]{0,3}) votes<"
Instead :)
I changed the first digit to being "1 or more" instead of "0 or more", I made the comma "zero or one occurrences", and instead of repeating the last digit pattern 3 times, we say we want 0-3 of that type. I am sure this regex could be improved upon, as it won't deal with cases where the number is: 1,000,000

My thoughts on your error; You say you set:

Code: Select all

IMDB_Votes = 1,000
If you try doing this in the python interpreter, you end up with a tuple containing 0 and 1:
(0,1)
This is obviously not what you wanted, instead you want

Code: Select all

IMDB_Votes = 1000
Which will be the number one thousand.

I think the same is going wrong with the second number (the one retrieved from imdb), but python is screaming at you because you are specifically casting it to an int, and python doesn't know how to do that when the string is not all numbers. You need to strip out any non-digit characters, and then create an int.

Hope that at least points you in the right direction!

Re: Simple Python IMDB Check Help

Posted: February 6th, 2010, 8:09 am
by pilGrim
Many thanks Camelot,

That helps a lot.  Any simple ideas on how to strip the , out of the string that represents the number of votes?

I did setup the code to just skip anything with less than 1000 votes, but wanted to be able to adjust beyond that with a variable.

Cheers,

pilgrim

Re: Simple Python IMDB Check Help

Posted: February 8th, 2010, 9:15 am
by Camelot
hey,

This should remove the commas

Code: Select all

imdbwebvotes = '6,563'
numbersSplit = imdbwebvotes.split(',')
numberStringWithoutCommas = ""
for num in numbersSplit:
    numberStringWithoutCommas = "%s%s" % (numberStringWithoutCommas, num)
real_number = int(numberStringWithoutCommas)
print "%d" % real_number
6563
This assumes that whatever is returned from imdb is a string (which it probably will be)

Hope that helps

Re: Simple Python IMDB Check Help

Posted: February 8th, 2010, 7:58 pm
by pilGrim
Camelot wrote: hey,

This should remove the commas

Code: Select all

imdbwebvotes = '6,563'
numbersSplit = imdbwebvotes.split(',')
numberStringWithoutCommas = ""
for num in numbersSplit:
    numberStringWithoutCommas = "%s%s" % (numberStringWithoutCommas, num)
real_number = int(numberStringWithoutCommas)
print "%d" % real_number
6563
This assumes that whatever is returned from imdb is a string (which it probably will be)

Hope that helps
Hi Camelot, many many thanks.  Will try this out later today.

cheers,

pilgrim

Re: Simple Python IMDB Check Help

Posted: February 8th, 2010, 11:42 pm
by pilGrim
Hi Camelot,

Worked like a charm.  Big thanks for the help.

Cheers!

Re: Simple Python IMDB Check Help

Posted: February 9th, 2010, 6:38 pm
by Camelot
no worries :)