Lets say I want to create and fill an empty dataframe with values from a loop. <pre class="prettyprint"><code>import pandas as pd import numpy as np years = [2013, 2014, 2015] dn=pd.DataFrame() for year in years: df1 = pd.DataFrame({'Incidents': [ 'C', 'B','A'], year: [1, 1, 1 ], }).set_index('Incidents') print (df1) dn=dn.append(df1, ignore_index = False) </code></pre> The append gives a diagonal matrix even when ignore index is false: <pre class="prettyprint"><code>>>> dn 2013 2014 2015 Incidents C 1 NaN NaN B 1 NaN NaN A 1 NaN NaN C NaN 1 NaN B NaN 1 NaN A NaN 1 NaN C NaN NaN 1 B NaN NaN 1 A NaN NaN 1 [9 rows x 3 columns] </code></pre> It should look like this: <pre class="prettyprint"><code>>>> dn 2013 2014 2015 Incidents C 1 1 1 B 1 1 1 A 1 1 1 [3 rows x 3 columns] </code></pre> Is there a better way of doing this? and is there a way to fix the append? I have pandas version '0.13.1-557-g300610e'

<pre class="prettyprint"><code>import pandas as pd years = [2013, 2014, 2015] dn = [] for year in years: df1 = pd.DataFrame({'Incidents': [ 'C', 'B','A'], year: [1, 1, 1 ], }).set_index('Incidents') dn.append(df1) dn = pd.concat(dn, axis=1) print(dn) </code></pre> yields <pre class="prettyprint"><code> 2013 2014 2015 Incidents C 1 1 1 B 1 1 1 A 1 1 1 </code></pre> <hr> Note that calling <code>pd.concat</code> once outside the loop is more time-efficient than calling <code>pd.concat</code> with each iteration of the loop. Each time you call <code>pd.concat</code> new space is allocated for a new DataFrame, and all the data from each component DataFrame is copied into the new DataFrame. If you call <code>pd.concat</code> from within the for-loop then you end up doing on the order of <code>n**2</code> copies, where <code>n</code> is the number of years. If you accumulate the partial DataFrames in a list and call <code>pd.concat</code> once outside the list, then Pandas only needs to perform <code>n</code> copies to make <code>dn</code>.

As far as I know you should avoid to add line by line to the dataframe due to speed issue What I usually do is: <pre class="prettyprint"><code>l1 = [] l2 = [] for i in range(n): compute value v1 compute value v2 l1.append(v1) l2.append(v2) d = pd.DataFrame() d['l1'] = l1 d['l2'] = l2 </code></pre>

Filling empty python dataframe using loops

Tags:

python

iteration

pandas

Lets say I want to create and fill an empty dataframe with values from a loop.

import pandas as pd
import numpy as np

years = [2013, 2014, 2015]
dn=pd.DataFrame()
for year in years:
    df1 = pd.DataFrame({'Incidents': [ 'C', 'B','A'],
                 year: [1, 1, 1 ],
                }).set_index('Incidents')
    print (df1)
    dn=dn.append(df1, ignore_index = False)

The append gives a diagonal matrix even when ignore index is false:

>>> dn
       2013  2014  2015
Incidents                  
C             1   NaN   NaN
B             1   NaN   NaN
A             1   NaN   NaN
C           NaN     1   NaN
B           NaN     1   NaN
A           NaN     1   NaN
C           NaN   NaN     1
B           NaN   NaN     1
A           NaN   NaN     1

[9 rows x 3 columns]

It should look like this:

>>> dn
       2013  2014  2015
Incidents                  
C             1   1   1
B             1   1   1
A             1   1   1

[3 rows x 3 columns]

Is there a better way of doing this? and is there a way to fix the append?

I have pandas version '0.13.1-557-g300610e'

295

asked Mar 07 '15 00:03

ccsv

2 Answers

import pandas as pd

years = [2013, 2014, 2015]
dn = []
for year in years:
    df1 = pd.DataFrame({'Incidents': [ 'C', 'B','A'],
                 year: [1, 1, 1 ],
                }).set_index('Incidents')
    dn.append(df1)
dn = pd.concat(dn, axis=1)
print(dn)

yields

           2013  2014  2015
Incidents                  
C             1     1     1
B             1     1     1
A             1     1     1

Note that calling pd.concat once outside the loop is more time-efficient than calling pd.concat with each iteration of the loop.

Each time you call pd.concat new space is allocated for a new DataFrame, and all the data from each component DataFrame is copied into the new DataFrame. If you call pd.concat from within the for-loop then you end up doing on the order of n**2 copies, where n is the number of years.

If you accumulate the partial DataFrames in a list and call pd.concat once outside the list, then Pandas only needs to perform n copies to make dn.

answered Oct 04 '22 10:10

unutbu

As far as I know you should avoid to add line by line to the dataframe due to speed issue

What I usually do is:

l1 = []
l2 = []

for i in range(n):
   compute value v1
   compute value v2
   l1.append(v1)
   l2.append(v2)

d = pd.DataFrame()
d['l1'] = l1
d['l2'] = l2

answered Oct 04 '22 09:10

Donbeo

Related questions
                            
                                Why isn't fromfile-prefix-chars in Python argparse working?
                            
                                How to plot a PMF of a sample?
                            
                                Pandas: fastest way to check if words in Series A endswith one word of Series B
                            
                                Create and set an element of a Pandas DataFrame to a list
                            
                                How to exit a script in Spyder?
                            
                                python -v prints out garbage [closed]
                            
                                How to ensure that a python function generates its output based only on its input?
                            
                                Beta Binomial Function in Python
                            
                                Python requests remove the Content-Length header from POST
                            
                                matlab isempty() function in numpy?
                            
                                Python PIL/Pillow - Pad image to desired size (eg. A4)
                            
                                How to read a gzip netcdf file in python?
                            
                                How can I print the type of a PyObject in an error message for an embedded Python script?
                            
                                How do I deploy a Python application to Amazon Elastic Beanstalk from Jenkins?
                            
                                Python - dictionary of lists
                            
                                What to choose to begin with ComputerVision: Scikit-image or OpenCV? [closed]
                            
                                How to submit a form in scrapy?
                            
                                Path in Variable with r'
                            
                                How do I set the matplotlib window size for the MacOSX backend?
                            
                                Convert unique numbers to md5 hash using pandas

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With