I'm sure this is simple but I'm quite new to Python. I have trouble how to add a list to a dataframe column or row after each iteration of the loop. I want to loop through a list of around hundred URLs with the outer for-loop and extract the data with the inner loop. Every time
With the code now I can create a dataframe that appends all lists together to one column or one row in the dataframe. But I want every iteration of the inner loop seperately in a new colum or row of the dataframe.
list_rows = []
for x in link_href_list:
urllib.request.urlopen(x)
html = urlopen(x)
bs = BeautifulSoup(html, "lxml")
table=bs.find('tbody')
rows = table.tr.next_siblings
for row in rows:
a=row.find('td').get_text().strip()
list_rows.append(a)
list_rows.to_frame()
Unfortunately the lists of the inner loop will have different lengths! maybe someone has a simple solution or a hint what I could change? Thanks!
I assume you meant every iteration of the outer loop in a new "row". This would create a 2 dimensional array (list) as a result, for each element in link_href_list you would get a new "row". Although, I have no idea what the to_frame() method is, I assume it is a printout.
list_columns = []
for x in link_href_list:
urllib.request.urlopen(x)
html = urlopen(x)
bs = BeautifulSoup(html, "lxml")
table=bs.find('tbody')
rows = table.tr.next_siblings
list_rows = []
for row in rows:
a=row.find('td').get_text().strip()
list_rows.append(a)
list_columns.append(list_rows)
list_columns.DataFrame()
Edit: If the to_frame is the pandas DataFrame thing, i am not entirely sure how will it handle different lengths. I will check in a couple, but there is a way around that as well. It seems that a very simle answer on how to import different length lists is not at hand and finding the longest list and adjust the pandas import accordingly and make the lists of equal length in a new loop.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With