Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When the dataframe has duplicate columns, it seems that fillna function cannot work correctly with dict parameter

Tags:

python

pandas

I find that after using pd.concat() to concatenate two dataframes with same column name, then df.fillna() will not work correctly with the dict parameter specifying which value to use for each column.

I don't know why? Is something wrong with my understanding?

a1 = pd.DataFrame({'a': [1, 2, 3]})
a2 = pd.DataFrame({'a': [1, 2, 3]})
b = pd.DataFrame({'b': [np.nan, 20, 30]})
c = pd.DataFrame({'c': [40, np.nan, 60]})
x = pd.concat([a1,a2, b, c], axis=1)
print(x)
x = x.fillna({'b':10, 'c': 50})
print(x)

Initial dataframe:

   a  a     b     c
0  1  1   NaN  40.0
1  2  2  20.0   NaN
2  3  3  30.0  60.0

Data is unchanged after df.fillna():

   a  a     b     c
0  1  1   NaN  40.0
1  2  2  20.0   NaN
2  3  3  30.0  60.0
like image 1000
Yaomin Chang Avatar asked Jan 16 '19 03:01

Yaomin Chang


People also ask

How do you find duplicates in a Dataframe?

Code 1: Find duplicate columns in a DataFrame. To find duplicate columns we need to iterate through all columns of a DataFrame and for each and every column it will search if any other column exists in DataFrame with the same contents already. If yes then that column name will be stored in the duplicate column set.

How to prevent duplicated columns from joining two data frames in Python?

In this approach to prevent duplicated columns from joining the two data frames, the user needs simply needs to use the pd.merge () function and pass its parameters as they join it using the inner join and the column names that are to be joined on from left and right data frames in python.

How to remove duplicate columns in pandas Dataframe?

To remove the duplicate columns we can pass the list of duplicate column’s names returned by our user defines function getDuplicateColumns () to the Dataframe.drop () method. How to Drop Columns with NaN Values in Pandas DataFrame?

What is the difference between dropna and fillna in Python?

Just like pandas dropna() method manage and remove Null values from a data frame, fillna() manages and let the user replace NaN values with some value of their own. Syntax: Parameters: value : Static, dictionary, array, series or dataframe to fill instead of NaN. method : Method is used if user doesn’t pass any value.


1 Answers

As mentioned in the comments, there's a problem assigning values to a dataframe in the presence of duplicate column names. However, you can use this workaround:

for col,val in {'b':10, 'c': 50}.items():
    new_col = x[col].fillna(val)
    idx = int(x.columns.get_loc(col))
    x = x.drop(col,axis=1)
    x.insert(loc=idx, column=col, value=new_col)

print(x)

result:

   a  a     b     c
0  1  1  10.0  40.0
1  2  2  20.0  50.0
2  3  3  30.0  60.0
like image 100
Udi Yosovzon Avatar answered Sep 30 '22 01:09

Udi Yosovzon