I am retrieving some content from a website which has several tables with the same number of columns, with pandas read_html
. When I read a single link that actually has several tables with the same number of columns, pandas effectively read all the tables as one (something like a flat/normalized table). However, I am interested in do the same for a list of links from a website (i.e. a single flat table for several links), so I tried the following:
In:
import multiprocessing
def process(url):
df_url = pd.read_html(url)
df = pd.concat(df_url, ignore_index=False)
return df_url
links = ['link1.com','link2.com','link3.com',...,'linkN.com']
pool = multiprocessing.Pool(processes=6)
df = pool.map(process, links)
df
Nevertheless, I guess I am not specifiying corecctly to read_html()
which are the columns, so I am getting this malformed list of lists:
Out:
[[ Form Disponibility \
0 290090 01780-500-01) Unavailable - no product available for release.
Relation \
Relation drawbacks
0 NaN Removed
1 NaN Removed ],
[ Form \
Relation \
0 American Regent is currently releasing the 0.4...
1 American Regent is currently releasing the 1mg...
drawbacks
0 Demand increase for the drug
1 Removed ,
Form \
0 0.1 mg/mL; 10 mL Luer-Jet Prefilled Syringe (N...
Disponibility Relation \
0 Product available NaN
2 Removed
3 Removed ]]
So my question which parameter should I move in order to get a flat pandas dataframe from the above nested list?. I tried to header=0
, index_col=0
, match='"columns"'
, none of them worked or do I need to do the flatting when I create the pandas dataframe with pd.Dataframe()
?. My main objective is to have a pandas dataframe like with this columns:
form, Disponibility, Relation, drawbacks
1
2
...
n
IIUC you can do it this way:
first you want to return concatenated DF, instead of list of DFs (as read_html
returns a list of DFs):
def process(url):
return pd.concat(pd.read_html(url), ignore_index=False)
and then concatenate them for all URLs:
df = pd.concat(pool.map(process, links), ignore_index=True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With