Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Concatenate pandas DataFrames generated with a loop

I am creating a new DataFrame named data_day, containing new features, for each day extrapolated from the day-timestamp of a previous DataFrame df.

My new dataframes data_day are 30 independent DataFrames that I need to concatenate/append at the end in a unic dataframe (final_data_day).

The for loop for each day is defined as follow:

num_days=len(list_day)

#list_day= random.sample(list_day,num_days_to_simulate)
data_frame = pd.DataFrame()

for i, day in enumerate(list_day):

    print('*** ',day,' ***')

    data_day=df[df.day==day]
    .....................
    final_data_day = pd.concat()

Hope I was clear. Mine is basically a problem of append/concatenation of data-frames generated in a non-trivial for loop

like image 476
Annalix Avatar asked Feb 15 '18 15:02

Annalix


People also ask

How do you concatenate a loop in a DataFrame?

To append pandas DataFrame generated in a for a loop, we will first create an empty list and then inside the loop, we will append the modified value inside this empty list, and finally, outside the loop, we will concat all the values of the new list to create DataFrame.

How do you concatenate multiple Dataframes in for loop in Python?

(1) Read data, (2) create a dataframe (3) Go to the next year and (4) Append that dataframe to previous dataframe. The ideal outcome should be 1 dataframe with ~500 rows and 13 columns (for 2 years worth of data).

How do I concatenate a list of Dataframes in pandas?

We'll pass two dataframes to pd. contact() method in the form of a list and mention in which axis you want to concat, i.e. axis=0 to concat along rows, axis=1 to concat along columns.

Is PD concat faster than PD append?

In this benchmark, concatenating multiple dataframes by using the Pandas. concat function is 50 times faster than using the DataFrame. append version.


2 Answers

Pandas concat takes a list of dataframes. If you can generate a list of dataframes with your looping function, once you are finished you can concatenate the list together:

data_day_list = []
for i, day in enumerate(list_day):
    data_day = df[df.day==day]
    data_day_list.append(data_day)
final_data_day = pd.concat(data_day_list)
like image 196
David Rinck Avatar answered Sep 20 '22 17:09

David Rinck


Exhausting a generator is more elegant (if not more efficient) than appending to a list. For example:

def yielder(df, list_day):
    for i, day in enumerate(list_day):
        yield df[df['day'] == day]

final_data_day = pd.concat(list(yielder(df, list_day))
like image 41
jpp Avatar answered Sep 21 '22 17:09

jpp