Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iterate over loop and adding list to dataframe in new row or new column

I'm sure this is simple but I'm quite new to Python. I have trouble how to add a list to a dataframe column or row after each iteration of the loop. I want to loop through a list of around hundred URLs with the outer for-loop and extract the data with the inner loop. Every time

With the code now I can create a dataframe that appends all lists together to one column or one row in the dataframe. But I want every iteration of the inner loop seperately in a new colum or row of the dataframe.

list_rows = [] 
for x in link_href_list: 
    urllib.request.urlopen(x)
    html = urlopen(x)
    bs = BeautifulSoup(html, "lxml")    
    table=bs.find('tbody')
    rows = table.tr.next_siblings

    for row in rows:
        a=row.find('td').get_text().strip()
        list_rows.append(a)
list_rows.to_frame()

Unfortunately the lists of the inner loop will have different lengths! maybe someone has a simple solution or a hint what I could change? Thanks!

like image 622
minada Avatar asked Oct 16 '22 08:10

minada


1 Answers

I assume you meant every iteration of the outer loop in a new "row". This would create a 2 dimensional array (list) as a result, for each element in link_href_list you would get a new "row". Although, I have no idea what the to_frame() method is, I assume it is a printout.

list_columns = [] 
for x in link_href_list: 
    urllib.request.urlopen(x)
    html = urlopen(x)
    bs = BeautifulSoup(html, "lxml")    
    table=bs.find('tbody')
    rows = table.tr.next_siblings
    list_rows = []

    for row in rows:
        a=row.find('td').get_text().strip()
        list_rows.append(a)
    list_columns.append(list_rows)
list_columns.DataFrame()

Edit: If the to_frame is the pandas DataFrame thing, i am not entirely sure how will it handle different lengths. I will check in a couple, but there is a way around that as well. It seems that a very simle answer on how to import different length lists is not at hand and finding the longest list and adjust the pandas import accordingly and make the lists of equal length in a new loop.

like image 52
Ventil Avatar answered Oct 21 '22 01:10

Ventil