Iam encountering some weird characters while web scraping some elements of the page . The characters that seem to give error are :
? ????Á¢¢Á? /?? />? /??? ?/¢¥Á ??%% ?Á ?????Á? ?> /???¥??> ¥? ¥©Á ?>¢¥/%%/¥??> ? >Á? Â?Á ©???¢ ñ%Á?¥???/% Á%Á?¥??>?? />? Â??Á? ??¥?? ??¢¥????¥??> ¢`¢¥Á¢ ??%% ?Á ??À?/?Á? ¥? _ÁÁ¥ ?>??Á/¢?>À Á????Á>¥ ????¥Á? />? ??__?>??/¥??>¢ ?Á
My code concerned is as below
url= "http://www.nsf.gov#######@#@#@##";
#webbrowser.open(url,new =new );
flagcnt+=1
if flagcnt%20==0: #autosleep for avoiding shut-out
print "flagcount: "
print flagcnt
time.sleep(5)
#Program Code extraction
r = requests.get (url)
sp=BeautifulSoup(r.content)
Page : http://www.nsf.gov/awardsearch
Iv read all pages on this error with some which suggest decoding and encoding but they dont seem to help.I dont know which encoding is being used here .Tried downgrading BS version but didnt help . Any help is appreciated . Python 2.7 BS 4
This works for me:
page_text = r.text.encode('utf-8').decode('ascii', 'ignore')
page_soupy = BeautifulSoup.BeautifulSoup(page_text)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With