Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert io.BytesIO to io.StringIO to parse HTML page

I'm trying to parse a HTML page I retrieved through pyCurl but the pyCurl WRITEFUNCTION is returning the page as BYTES and not string, so I'm unable to Parse it using BeautifulSoup.

Is there any way to convert io.BytesIO to io.StringIO?

Or Is there any other way to parse the HTML page?

I'm using Python 3.3.2.

like image 842
Shipra Avatar asked Jul 04 '14 04:07

Shipra


1 Answers

the code in the accepted answer actually reads from the stream completely for decoding. Below is the right way, converting one stream to another, where the data can be read chunk by chunk.

# Initialize a read buffer input = io.BytesIO(     b'Inital value for read buffer with unicode characters ' +     'ÁÇÊ'.encode('utf-8') ) wrapper = io.TextIOWrapper(input, encoding='utf-8')  # Read from the buffer print(wrapper.read()) 
like image 185
kakarukeys Avatar answered Sep 24 '22 06:09

kakarukeys