Suppose I have the following DataFrame with some identical column names
test = pd.DataFrame([[1, 2, 3, np.nan, np.nan],
[1, 2, 3, 4, 5],
[1, 2, 3, np.nan, np.nan],
[1, 2, 3, 4, np.nan]],
columns=['One', 'Two', 'Three', 'Three', 'Three'])
and I want to fill the NaNs in the fourth column. I would expect to be able to use iloc like
test.iloc[:, 3] = test.iloc[:, 3].fillna('F')
but this gives
In [121]: test
Out[121]:
One Two Three Three Three
0 1 2 F F F
1 1 2 4 4 4
2 1 2 F F F
3 1 2 4 4 4
So it changes based on the column name and not the position. I could do it very naïvely like the following.
c = test.columns
test.columns = range(len(test.columns))
test.iloc[:, 3] = test.iloc[:, 3].fillna('F')
test.columns = c
which gives the correct result
In [142]: test
Out[142]:
One Two Three Three Three
0 1 2 3 F NaN
1 1 2 3 4 5.0
2 1 2 3 F NaN
3 1 2 3 4 NaN
but seems a bit inefficient considering the simple task.
My question is then twofold.
iloc still resort to names when replacing columns?)The answer to your second question as to why the first technique doesn't work could be because of the way Pandas treats duplicate columns. While the constructor for a DataFrame doesn't have any setting for that, the read_csvdocumentation has a parameter mangle_dupe_cols whose default value is True. The documentation says that passing in False could lead to data overwrite. I suspect Pandas treats duplicate columns in a questionable way.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With