Lets say I want to create and fill an empty dataframe with values from a loop.
import pandas as pd
import numpy as np
years = [2013, 2014, 2015]
dn=pd.DataFrame()
for year in years:
df1 = pd.DataFrame({'Incidents': [ 'C', 'B','A'],
year: [1, 1, 1 ],
}).set_index('Incidents')
print (df1)
dn=dn.append(df1, ignore_index = False)
The append gives a diagonal matrix even when ignore index is false:
>>> dn
2013 2014 2015
Incidents
C 1 NaN NaN
B 1 NaN NaN
A 1 NaN NaN
C NaN 1 NaN
B NaN 1 NaN
A NaN 1 NaN
C NaN NaN 1
B NaN NaN 1
A NaN NaN 1
[9 rows x 3 columns]
It should look like this:
>>> dn
2013 2014 2015
Incidents
C 1 1 1
B 1 1 1
A 1 1 1
[3 rows x 3 columns]
Is there a better way of doing this? and is there a way to fix the append?
I have pandas version '0.13.1-557-g300610e'
Fill Data in an Empty Pandas DataFrame by Appending Rows First, create an empty DataFrame with column names and then append rows one by one. The append() method can also append rows. When creating an empty DataFrame with column names and row indices, we can fill data in rows using the loc() method.
Append Data to an Empty Pandas Dataframe loc , we can also use the . append() method to add rows. The . append() method works by, well, appending a dataframe to another dataframe.
import pandas as pd
years = [2013, 2014, 2015]
dn = []
for year in years:
df1 = pd.DataFrame({'Incidents': [ 'C', 'B','A'],
year: [1, 1, 1 ],
}).set_index('Incidents')
dn.append(df1)
dn = pd.concat(dn, axis=1)
print(dn)
yields
2013 2014 2015
Incidents
C 1 1 1
B 1 1 1
A 1 1 1
Note that calling pd.concat
once outside the loop is more time-efficient
than calling pd.concat
with each iteration of the loop.
Each time you call pd.concat
new space is allocated for a new DataFrame, and
all the data from each component DataFrame is copied into the new DataFrame. If
you call pd.concat
from within the for-loop then you end up doing on the order
of n**2
copies, where n
is the number of years.
If you accumulate the partial DataFrames in a list and call pd.concat
once
outside the list, then Pandas only needs to perform n
copies to make dn
.
As far as I know you should avoid to add line by line to the dataframe due to speed issue
What I usually do is:
l1 = []
l2 = []
for i in range(n):
compute value v1
compute value v2
l1.append(v1)
l2.append(v2)
d = pd.DataFrame()
d['l1'] = l1
d['l2'] = l2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With