I am creating a new DataFrame named data_day, containing new features, for each day extrapolated from the day-timestamp of a previous DataFrame df.
My new dataframes data_day are 30 independent DataFrames that I need to concatenate/append at the end in a unic dataframe (final_data_day).
The for loop for each day is defined as follow:
num_days=len(list_day)
#list_day= random.sample(list_day,num_days_to_simulate)
data_frame = pd.DataFrame()
for i, day in enumerate(list_day):
print('*** ',day,' ***')
data_day=df[df.day==day]
.....................
final_data_day = pd.concat()
Hope I was clear. Mine is basically a problem of append/concatenation of data-frames generated in a non-trivial for loop
To append pandas DataFrame generated in a for a loop, we will first create an empty list and then inside the loop, we will append the modified value inside this empty list, and finally, outside the loop, we will concat all the values of the new list to create DataFrame.
(1) Read data, (2) create a dataframe (3) Go to the next year and (4) Append that dataframe to previous dataframe. The ideal outcome should be 1 dataframe with ~500 rows and 13 columns (for 2 years worth of data).
We'll pass two dataframes to pd. contact() method in the form of a list and mention in which axis you want to concat, i.e. axis=0 to concat along rows, axis=1 to concat along columns.
In this benchmark, concatenating multiple dataframes by using the Pandas. concat function is 50 times faster than using the DataFrame. append version.
Pandas concat takes a list of dataframes. If you can generate a list of dataframes with your looping function, once you are finished you can concatenate the list together:
data_day_list = []
for i, day in enumerate(list_day):
data_day = df[df.day==day]
data_day_list.append(data_day)
final_data_day = pd.concat(data_day_list)
Exhausting a generator is more elegant (if not more efficient) than appending to a list. For example:
def yielder(df, list_day):
for i, day in enumerate(list_day):
yield df[df['day'] == day]
final_data_day = pd.concat(list(yielder(df, list_day))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With