<code>pandas</code> provides an useful <code>to_html()</code> to convert the <code>DataFrame</code> into the <code>html table</code>. Is there any useful function to read it back to the <code>DataFrame</code>?

The read_html utility released in pandas 0.12

In the general case it is not possible but if you approximately know the structure of your table you could something like this: <pre class="prettyprint"><code># Create a test df: >>> df = DataFrame(np.random.rand(4,5), columns = list('abcde')) >>> df a b c d e 0 0.675006 0.230464 0.386991 0.422778 0.657711 1 0.250519 0.184570 0.470301 0.811388 0.762004 2 0.363777 0.715686 0.272506 0.124069 0.045023 3 0.657702 0.783069 0.473232 0.592722 0.855030 </code></pre> Now parse the html and reconstruct: <pre class="prettyprint"><code>from pyquery import PyQuery as pq d = pq(df.to_html()) columns = d('thead tr').eq(0).text().split() n_rows = len(d('tbody tr')) values = np.array(d('tbody tr td').text().split(), dtype=float).reshape(n_rows, len(columns)) >>> DataFrame(values, columns=columns) a b c d e 0 0.675006 0.230464 0.386991 0.422778 0.657711 1 0.250519 0.184570 0.470301 0.811388 0.762004 2 0.363777 0.715686 0.272506 0.124069 0.045023 3 0.657702 0.783069 0.473232 0.592722 0.855030 </code></pre> You could extend it for Multiindex dfs or automatic type detection using <code>eval()</code> if needed.

How to convert a html table into pandas dataframe

2 Answers

The read_html utility released in pandas 0.12

142

answered Sep 20 '22 20:09

waitingkuo

In the general case it is not possible but if you approximately know the structure of your table you could something like this:

# Create a test df:
>>> df = DataFrame(np.random.rand(4,5), columns = list('abcde'))
>>> df
     a           b           c           d           e
0    0.675006    0.230464    0.386991    0.422778    0.657711
1    0.250519    0.184570    0.470301    0.811388    0.762004
2    0.363777    0.715686    0.272506    0.124069    0.045023
3    0.657702    0.783069    0.473232    0.592722    0.855030

Now parse the html and reconstruct:

from pyquery import PyQuery as pq

d = pq(df.to_html())
columns = d('thead tr').eq(0).text().split()
n_rows = len(d('tbody tr'))
values = np.array(d('tbody tr td').text().split(), dtype=float).reshape(n_rows, len(columns))
>>> DataFrame(values, columns=columns)

     a           b           c           d           e
0    0.675006    0.230464    0.386991    0.422778    0.657711
1    0.250519    0.184570    0.470301    0.811388    0.762004
2    0.363777    0.715686    0.272506    0.124069    0.045023
3    0.657702    0.783069    0.473232    0.592722    0.855030

You could extend it for Multiindex dfs or automatic type detection using eval() if needed.

answered Sep 18 '22 20:09

elyase

Related questions
                            
                                Python list transpose and fill
                            
                                Capturing http status codes with scrapy spider
                            
                                Is there a good way to produce documentation for swig interfaces?
                            
                                How to implement authentication for REST API?
                            
                                Python SOAP client, WSDL call with suds gives Transport Error 401 Unauthorized for HTTP basic authentication
                            
                                Converting data to missing in pandas
                            
                                How can I sort a list of dictionaries by a value in the dictionary? [duplicate]
                            
                                Accessing files in python egg from inside the egg
                            
                                subprocess.Popen execve() arg 3 contains a non-string value
                            
                                how to trigger a python script in outlook using rules?
                            
                                Convert EMF/WMF files to PNG/JPG
                            
                                Deep version of sys.getsizeof [duplicate]
                            
                                How could I arrange multiple pyplot figures in a kind of layout?
                            
                                Paramiko / ssh / tail + grep hangs
                            
                                Digitizing an analog signal
                            
                                tracking progress of a celery.group task?
                            
                                Running Blender python script outside of blender
                            
                                Embedded python: multiprocessing not working
                            
                                Fit points to a plane algorithms, how to iterpret results?
                            
                                Tie breaking of round with numpy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to convert a html table into pandas dataframe

Tags:

python

html-table

pandas

dataframe

waitingkuo

People also ask

2 Answers

waitingkuo

elyase

Recent Activity

Donate For Us