Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas adds column header as entry instead of actual data after adding new column

Tags:

python

pandas

I have an unexpected behavior when adding a new row to a pre-allocated DataFrame after I added a new column to this DataFrame.

I created the following minimal example (using Python 3.6.5 and Panda 0.23.0):

First, I create a pre-allocated DataFrame with 3 columns

import pandas as pd
df = pd.DataFrame(columns=('A', 'B', 'C'), index=range(5))

# The resulting DataFrame df
#     A    B    C
#0  NaN  NaN  NaN
#1  NaN  NaN  NaN
#2  NaN  NaN  NaN
#3  NaN  NaN  NaN
#4  NaN  NaN  NaN

Then, I am adding a few rows, which works like expected

new_row = {'A':0, 'B':0, 'C':0}
df.loc[0] = new_row
df.loc[1] = new_row
df.loc[2] = new_row

# The resulting DataFrame df
#     A    B    C
#0    0    0    0
#1    0    0    0
#2    0    0    0
#3  NaN  NaN  NaN
#4  NaN  NaN  NaN

Then, I am adding a new column with a default value

df['D'] = 0

# The resulting DataFrame df
#     A    B    C  D
#0    0    0    0  0
#1    0    0    0  0
#2    0    0    0  0
#3  NaN  NaN  NaN  0
#4  NaN  NaN  NaN  0

And eventually, adding a new row after adding the new column, I get this

new_row = {'A':0, 'B':0, 'C':0, 'D':0} 
df.loc[3] = new_row

# The resulting DataFrame df
#     A    B    C  D
#0    0    0    0  0
#1    0    0    0  0
#2    0    0    0  0
#3    A    B    C  D
#4  NaN  NaN  NaN  0

So it seams that, for some reason the DataFrame header is added as the new row instead of the actual values. Am I doing something wrong? I noted that this only happens when I set the size of the table with index=range(5). If I do not set the size of the table adding columns and rows is working like expected. However, I would like to pre-allocate the table due to performance issues.

like image 849
Starvin Marvin Avatar asked Nov 08 '22 01:11

Starvin Marvin


1 Answers

It's a problem with the datatypes. When you create a dataframe without specifying any data, it automatically assigns datatype object to all columns.

Create your dataframe like this:

df = pd.DataFrame(columns=('A', 'B', 'C'), index=range(5), data=0)
like image 150
Edgar Ramírez Mondragón Avatar answered Nov 14 '22 22:11

Edgar Ramírez Mondragón