In Wikipedia, you can find some interesting data to be sorted, filtered, ...
Here is a sample of a wikitable
{| class="wikitable sortable"
|-
! Model !! Mhash/s !! Mhash/J !! Watts !! Clock !! SP !! Comment
|-
| ION || 1.8 || 0.067 || 27 || || 16 || poclbm; power consumption incl. CPU
|-
| 8200 mGPU || 1.2 || || || 1200 || 16 || 128 MB shared memory, "poclbm -w 128 -f 0"
|-
| 8400 GS || 2.3 || || || || || "poclbm -w 128"
|-
|}
I'm looking for a way to import such data to a Python Pandas DataFrame
Here's a solution using py-wikimarkup and PyQuery to extract all tables as pandas DataFrames from a wikimarkup string, ignoring non-table content.
import wikimarkup
import pandas as pd
from pyquery import PyQuery
def get_tables(wiki):
html = PyQuery(wikimarkup.parse(wiki))
frames = []
for table in html('table'):
data = [[x.text.strip() for x in row]
for row in table.getchildren()]
df = pd.DataFrame(data[1:], columns=data[0])
frames.append(df)
return frames
Given the following input,
wiki = """
=Title=
Description.
{| class="wikitable sortable"
|-
! Model !! Mhash/s !! Mhash/J !! Watts !! Clock !! SP !! Comment
|-
| ION || 1.8 || 0.067 || 27 || || 16 || poclbm; power consumption incl. CPU
|-
| 8200 mGPU || 1.2 || || || 1200 || 16 || 128 MB shared memory, "poclbm -w 128 -f 0"
|-
| 8400 GS || 2.3 || || || || || "poclbm -w 128"
|-
|}
{| class="wikitable sortable"
|-
! A !! B !! C
|-
| 0
| 1
| 2
|-
| 3
| 4
| 5
|}
"""
get_tables
returns the following DataFrames.
Model Mhash/s Mhash/J Watts Clock SP Comment
0 ION 1.8 0.067 27 16 poclbm; power consumption incl. CPU
1 8200 mGPU 1.2 1200 16 128 MB shared memory, "poclbm -w 128 -f 0"
2 8400 GS 2.3 "poclbm -w 128"
A B C
0 0 1 2
1 3 4 5
You can use pandas directly. Something like this...
pandas.read_html(url, attrs={"class": "wikitable"})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With