Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using iloc to replace a column when identical names exist

Suppose I have the following DataFrame with some identical column names

test = pd.DataFrame([[1, 2, 3, np.nan, np.nan],
                     [1, 2, 3,      4,      5],
                     [1, 2, 3, np.nan, np.nan],
                     [1, 2, 3,      4, np.nan]],
                    columns=['One', 'Two', 'Three', 'Three', 'Three'])

and I want to fill the NaNs in the fourth column. I would expect to be able to use iloc like

test.iloc[:, 3] = test.iloc[:, 3].fillna('F')

but this gives

In [121]: test
Out[121]:
   One  Two Three Three Three
0    1    2     F     F     F
1    1    2     4     4     4
2    1    2     F     F     F
3    1    2     4     4     4

So it changes based on the column name and not the position. I could do it very naïvely like the following.

c = test.columns
test.columns = range(len(test.columns))
test.iloc[:, 3] = test.iloc[:, 3].fillna('F')
test.columns = c

which gives the correct result

In [142]: test
Out[142]:
   One  Two  Three  Three  Three
0    1    2      3      F    NaN
1    1    2      3      4    5.0
2    1    2      3      F    NaN
3    1    2      3      4    NaN

but seems a bit inefficient considering the simple task.

My question is then twofold.

  • Would there be a more straightforward method?
  • Why doesn't the first one work? (why does iloc still resort to names when replacing columns?)
like image 595
PidgeyUsedGust Avatar asked Apr 21 '26 08:04

PidgeyUsedGust


1 Answers

The answer to your second question as to why the first technique doesn't work could be because of the way Pandas treats duplicate columns. While the constructor for a DataFrame doesn't have any setting for that, the read_csvdocumentation has a parameter mangle_dupe_cols whose default value is True. The documentation says that passing in False could lead to data overwrite. I suspect Pandas treats duplicate columns in a questionable way.

like image 50
V. Singh Avatar answered Apr 22 '26 20:04

V. Singh