I am appending rows to a pandas DataFrame within a for loop, but at the end the dataframe is always empty. I don't want to add the rows to an array and then call the DataFrame constructer, because my actual for loop handles lots of data. I also tried pd.concat
without success. Could anyone highlight what I am missing to make the append statement work? Here's a dummy example:
import pandas as pd import numpy as np data = pd.DataFrame([]) for i in np.arange(0, 4): if i % 2 == 0: data.append(pd.DataFrame({'A': i, 'B': i + 1}, index=[0]), ignore_index=True) else: data.append(pd.DataFrame({'A': i}, index=[0]), ignore_index=True) print data.head() Empty DataFrame Columns: [] Index: [] [Finished in 0.676s]
It turns out Pandas does have an effective way to append to a dataframe: df. loc( len(df) ) = [new, row, of, data] will "append" to the end of a dataframe in-place.
DataFrame Looping (iteration) with a for statement. You can loop over a pandas dataframe, for each column row by row.
append() function is used to append rows of other dataframe to the end of the given dataframe, returning a new dataframe object. Columns not in the original dataframes are added as new columns and the new cells are populated with NaN value. ignore_index : If True, do not use the index labels.
Every time you call append, Pandas returns a copy of the original dataframe plus your new row. This is called quadratic copy, and it is an O(N^2) operation that will quickly become very slow (especially since you have lots of data).
In your case, I would recommend using lists, appending to them, and then calling the dataframe constructor.
a_list = [] b_list = [] for data in my_data: a, b = process_data(data) a_list.append(a) b_list.append(b) df = pd.DataFrame({'A': a_list, 'B': b_list}) del a_list, b_list
Timings
%%timeit data = pd.DataFrame([]) for i in np.arange(0, 10000): if i % 2 == 0: data = data.append(pd.DataFrame({'A': i, 'B': i + 1}, index=[0]), ignore_index=True) else: data = data.append(pd.DataFrame({'A': i}, index=[0]), ignore_index=True) 1 loops, best of 3: 6.8 s per loop %%timeit a_list = [] b_list = [] for i in np.arange(0, 10000): if i % 2 == 0: a_list.append(i) b_list.append(i + 1) else: a_list.append(i) b_list.append(None) data = pd.DataFrame({'A': a_list, 'B': b_list}) 100 loops, best of 3: 8.54 ms per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With