Using pandas .append within for loop

Tags:

I am appending rows to a pandas DataFrame within a for loop, but at the end the dataframe is always empty. I don't want to add the rows to an array and then call the DataFrame constructer, because my actual for loop handles lots of data. I also tried pd.concat without success. Could anyone highlight what I am missing to make the append statement work? Here's a dummy example:

import pandas as pd import numpy as np  data = pd.DataFrame([])  for i in np.arange(0, 4):     if i % 2 == 0:         data.append(pd.DataFrame({'A': i, 'B': i + 1}, index=[0]), ignore_index=True)     else:         data.append(pd.DataFrame({'A': i}, index=[0]), ignore_index=True)  print data.head()  Empty DataFrame Columns: [] Index: [] [Finished in 0.676s]

884

asked May 03 '16 16:05

calpyte

1 Answers

Every time you call append, Pandas returns a copy of the original dataframe plus your new row. This is called quadratic copy, and it is an O(N^2) operation that will quickly become very slow (especially since you have lots of data).

In your case, I would recommend using lists, appending to them, and then calling the dataframe constructor.

a_list = [] b_list = [] for data in my_data:     a, b = process_data(data)     a_list.append(a)     b_list.append(b) df = pd.DataFrame({'A': a_list, 'B': b_list}) del a_list, b_list

Timings

%%timeit data = pd.DataFrame([]) for i in np.arange(0, 10000):     if i % 2 == 0:         data = data.append(pd.DataFrame({'A': i, 'B': i + 1}, index=[0]), ignore_index=True) else:     data = data.append(pd.DataFrame({'A': i}, index=[0]), ignore_index=True) 1 loops, best of 3: 6.8 s per loop  %%timeit a_list = [] b_list = [] for i in np.arange(0, 10000):     if i % 2 == 0:         a_list.append(i)         b_list.append(i + 1)     else:         a_list.append(i)         b_list.append(None) data = pd.DataFrame({'A': a_list, 'B': b_list}) 100 loops, best of 3: 8.54 ms per loop

175

answered Sep 17 '22 22:09

Alexander

Related questions
                            
                                How to scale images to screen size in Pygame
                            
                                Start IPython notebook server without running web browser?
                            
                                Sendmail Errno[61] Connection Refused
                            
                                Run a program from python, and have it continue to run after the script is killed
                            
                                Matplotlib fill between multiple lines
                            
                                Call another click command from a click command
                            
                                How to make python3 command run Python 3.6 instead of 3.5?
                            
                                python PIL draw multiline text on image
                            
                                How to draw a line on an image in OpenCV?
                            
                                How to replace values at specific indexes of a python list?
                            
                                How to create a word cloud from a corpus in Python?
                            
                                Pandas - dataframe groupby - how to get sum of multiple columns
                            
                                Does Python have something like anonymous inner classes of Java?
                            
                                python : list index out of range error while iteratively popping elements
                            
                                Regex to match digits of specific length
                            
                                render_template with multiple variables
                            
                                Correct way to obtain confidence interval with scipy
                            
                                How to call module written with argparse in iPython notebook
                            
                                Pythonic way to select list elements with different probability [duplicate]
                            
                                Python Flask, TypeError: 'dict' object is not callable

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using pandas .append within for loop

Tags:

python

concat

pandas

append

calpyte

People also ask

1 Answers

Alexander

Recent Activity

Donate For Us