Python Beautiful Soup 'ascii' codec can't encode character u'\xa5'

Question

Iam encountering some weird characters while web scraping some elements of the page . The characters that seem to give error are :

? ????Á¢¢Á? /?? />? /??? ?/¢¥Á ??%% ?Á ?????Á? ?> /???¥??> ¥? ¥©Á ?>¢¥/%%/¥??> ?Â >Á? Â?Á ©???¢ ñ%Á?¥???/% Á%Á?¥??>?? />? Â??Á? ??¥?? ??¢¥????¥??> ¢`¢¥Á¢ ??%% ?Á ??À?/?Á? ¥? _ÁÁ¥ ?>??Á/¢?>À Á????Á>¥ ????¥Á? />? ??__?>??/¥??>¢ ?Á

My code concerned is as below

url= "http://www.nsf.gov#######@#@#@##";
    #webbrowser.open(url,new =new );
    flagcnt+=1
    if flagcnt%20==0: #autosleep for avoiding shut-out
        print "flagcount: "
        print flagcnt
        time.sleep(5)
     #Program Code extraction
    r = requests.get (url)
    sp=BeautifulSoup(r.content)

Page : http://www.nsf.gov/awardsearch

Iv read all pages on this error with some which suggest decoding and encoding but they dont seem to help.I dont know which encoding is being used here .Tried downgrading BS version but didnt help . Any help is appreciated . Python 2.7 BS 4

nivix zixer · Accepted Answer

This works for me:

page_text = r.text.encode('utf-8').decode('ascii', 'ignore')
page_soupy = BeautifulSoup.BeautifulSoup(page_text)

Python Beautiful Soup 'ascii' codec can't encode character u'\xa5'

Tags:

python

html

beautifulsoup

web-scraping

bs4

Pulkit Bhardwaj

1 Answers

nivix zixer

Recent Activity

Donate For Us

Python Beautiful Soup 'ascii' codec can't encode character u'\xa5'

Tags:

python

html

beautifulsoup

web-scraping

bs4

Pulkit Bhardwaj

1 Answers

nivix zixer

Related questions

Recent Activity

Donate For Us