ffill weird behavior , when have the duplicate columns names

Question

I have a DataFrame as below

df=pd.DataFrame({'A':[np.nan,1,1,np.nan],'B':[2,np.nan,2,2]},index=[1,1,2,2])
df.columns=['A','A']

Now I want to ffill the values groupby the index , first I try

df.groupby(level=0).ffill()

Which returns the error code

> ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

It looks like a bug, then I am trying with apply, which returns the expected output.

df.groupby(level=0).apply(lambda x : x.ffill())
     A    A
1  NaN  2.0
1  1.0  2.0
2  1.0  2.0
2  1.0  2.0

For your reference when the columns is unique , it works just(Q2) fine, however, create one index columns and columns name is NaN

df.columns=['C','D']
df.groupby(level=0).ffill()
   NaN    C    D
1    1  NaN  2.0
1    1  1.0  2.0
2    2  1.0  2.0
2    2  1.0  2.0

Question :
1 Is this a bug ? why apply can still work with this type situation ?

2 why groupby with index and ffill, it creates the additional columns ?

fpersyn · Accepted Answer

It sure looks bugged. Just wanted to note that according to the pandas documentation the .ffill() method is a synonym for .fillna(method='ffill'). Using the latter generates your expected output for both your examples in pandas version 0.23.4 without any errors or additional columns. Hope that helps.

import pandas as pd
import numpy as np
df=pd.DataFrame({'A':[np.nan,1,1,np.nan],'B':[2,np.nan,2,2]},index=[1,1,2,2])

df.columns=['A','A'] #dup column names
df.groupby(level=0).fillna(method='ffill')

Output:
    A   A
1   NaN 2.0
1   1.0 2.0
2   1.0 2.0
2   1.0 2.0

ffill weird behavior , when have the duplicate columns names

Tags:

python

pandas

BENY

1 Answers

fpersyn

Recent Activity

Donate For Us

ffill weird behavior , when have the duplicate columns names

Tags:

python

pandas

BENY

1 Answers

fpersyn

Related questions

Recent Activity

Donate For Us