Pandas read_html equivalent for a lxml table

Question

Hi I have about 10 tables which I have used lxml to classify.

>>>import pandas as pd
>>>import lxml
>>>root = lxml.etree.HTML(htmlcontent)
>>>tables = root.findall('.//*[@id="info-container"]/table')
>>>readabletables = tables[::2]
>>>len(readabletables) = 5
>>>readabletables[0]
<Element table at 0x105241e60>

I want these 5 tables to be read and interpreted by pandas just like pd.read_html.

How would I go about doing this?

user3374113 · Accepted Answer

I am able to now answer my own question and maybe this can be of assistance to others.

I tried modifying the read_html source code in pandas without much success because of some recognition issues. Nonetheless the answer is much simpler than you might think.

>>>import pandas as pd
>>>import lxml
>>>root = lxml.etree.HTML(htmlcontent)
>>>tables = root.findall('.//*[@id="info-container"]/table')
>>>readabletables = tables[::2]
>>>len(readabletables) = 5

^ This is what we have already established.

Now in order for pandas's read_html to recognise a lxml table, the table need to be converted in to html. To this we do the following:

>>>etree.tostring(readabletables[0],method='html')
'<table... table>'

To convert all the tables in to pandas df inside a list:

>>>pd_tables = [pd.read_html(lxml.etree.tostring(table,method='html')) for table in readabletables]
>>>len(pd_tables)
5
>>>type(pd_tables[0])
<class 'pandas.core.frame.DataFrame'>

Mission accomplished.

Pandas read_html equivalent for a lxml table

Tags:

python

pandas

lxml

user3374113

1 Answers

user3374113

Recent Activity

Donate For Us

Pandas read_html equivalent for a lxml table

Tags:

python

pandas

lxml

user3374113

1 Answers

user3374113

Related questions

Recent Activity

Donate For Us