I have an unexpected behavior when adding a new row to a pre-allocated DataFrame after I added a new column to this DataFrame.
I created the following minimal example (using Python 3.6.5 and Panda 0.23.0):
First, I create a pre-allocated DataFrame with 3 columns
import pandas as pd
df = pd.DataFrame(columns=('A', 'B', 'C'), index=range(5))
# The resulting DataFrame df
# A B C
#0 NaN NaN NaN
#1 NaN NaN NaN
#2 NaN NaN NaN
#3 NaN NaN NaN
#4 NaN NaN NaN
Then, I am adding a few rows, which works like expected
new_row = {'A':0, 'B':0, 'C':0}
df.loc[0] = new_row
df.loc[1] = new_row
df.loc[2] = new_row
# The resulting DataFrame df
# A B C
#0 0 0 0
#1 0 0 0
#2 0 0 0
#3 NaN NaN NaN
#4 NaN NaN NaN
Then, I am adding a new column with a default value
df['D'] = 0
# The resulting DataFrame df
# A B C D
#0 0 0 0 0
#1 0 0 0 0
#2 0 0 0 0
#3 NaN NaN NaN 0
#4 NaN NaN NaN 0
And eventually, adding a new row after adding the new column, I get this
new_row = {'A':0, 'B':0, 'C':0, 'D':0}
df.loc[3] = new_row
# The resulting DataFrame df
# A B C D
#0 0 0 0 0
#1 0 0 0 0
#2 0 0 0 0
#3 A B C D
#4 NaN NaN NaN 0
So it seams that, for some reason the DataFrame header is added as the new row instead of the actual values. Am I doing something wrong? I noted that this only happens when I set the size of the table with index=range(5)
. If I do not set the size of the table adding columns and rows is working like expected. However, I would like to pre-allocate the table due to performance issues.
It's a problem with the datatypes. When you create a dataframe without specifying any data, it automatically assigns datatype object
to all columns.
Create your dataframe like this:
df = pd.DataFrame(columns=('A', 'B', 'C'), index=range(5), data=0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With