I'm trying to fetch some numbers from a webpage using requests. The numbers available in there are in images. The script I've written so far can show
the numbers as I've used PIL
library but can't print them.
website address
Numbers visible in there just above the submit button are like:
I've tried so far:
import io
import requests
from PIL import Image
from bs4 import BeautifulSoup
from urllib.parse import urljoin
base = 'http://horoscope.horoscopezen.com/'
url = 'http://horoscope.horoscopezen.com/archive2.asp?day=2&month=1&year=2022&sign=1#.Xy07M4oza1v'
def get_numbers(link):
r = requests.get(link)
soup = BeautifulSoup(r.text,"lxml")
image_links = [urljoin(base,td['src']) for td in soup.select("td > img[src^='secimage.asp?']")]
for image_link in image_links:
r = requests.get(image_link)
img = Image.open(io.BytesIO(r.content))
img.show()
break
if __name__ == '__main__':
get_numbers(url)
How can I fetch the numbers from that site?
You don't need to use OCR here. The image itself is composed of separate images for each number, and by parsing the image link you can get the entire number.
The image link is of the form http://horoscope.horoscopezen.com/secimage.asp?I=1&N=595A5C585A5C
It seems like the I=
parameter is the index of the digit, and the N=
parameter is the entire number. The translation seems to be as follows:
56 -> 9
57 -> 8
58 -> 7
59 -> 6
5A -> 5
5B -> 4
5C -> 3
5D -> 2
5E -> 1
5F -> 0
Note these numbers are in hex encoding (all characters are 0-9,A-F). Since 0x56 corresponds to 9 and 0x5F to 0 (and 0x56 + 9 == 0x5F), to get the digit we could use the formula 9 - hex_num + 0x56
. For example, 56 would be converted to 9 - 0x56 + 0x56 = 9
and 5E would be translated to 9 - 0x5E + 0x56 = 9 - 8 = 1
So you could change your code to print the entire number using something like:
def url_to_number(url):
all_digits = []
# We want the encoded number, find '&N=' and get the characters after it
N = url[url.find('&N=') + 3:]
# loop the characters in pairs
for i in range(0, len(N), 2):
digit = 9 - int(N[i:i+2], 16) + 0x56
all_digits.append(digit)
return all_digits
The line digit = 9 - int(N[i:i+2], 16) + 0x56
does the conversion I mentioned earlier. int(N[i:i+2], 16)
converts the number from string to int, given it is in base 16 (hexadecimal).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With