Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ffill weird behavior , when have the duplicate columns names

Tags:

python

pandas

I have a DataFrame as below


df=pd.DataFrame({'A':[np.nan,1,1,np.nan],'B':[2,np.nan,2,2]},index=[1,1,2,2])
df.columns=['A','A']

Now I want to ffill the values groupby the index , first I try

df.groupby(level=0).ffill()

Which returns the error code

> ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

It looks like a bug, then I am trying with apply, which returns the expected output.

df.groupby(level=0).apply(lambda x : x.ffill())
     A    A
1  NaN  2.0
1  1.0  2.0
2  1.0  2.0
2  1.0  2.0

For your reference when the columns is unique , it works just(Q2) fine, however, create one index columns and columns name is NaN

df.columns=['C','D']
df.groupby(level=0).ffill()
   NaN    C    D
1    1  NaN  2.0
1    1  1.0  2.0
2    2  1.0  2.0
2    2  1.0  2.0

Question :
1 Is this a bug ? why apply can still work with this type situation ?

2 why groupby with index and ffill, it creates the additional columns ?

like image 488
BENY Avatar asked Apr 10 '19 16:04

BENY


1 Answers

It sure looks bugged. Just wanted to note that according to the pandas documentation the .ffill() method is a synonym for .fillna(method='ffill'). Using the latter generates your expected output for both your examples in pandas version 0.23.4 without any errors or additional columns. Hope that helps.

import pandas as pd
import numpy as np
df=pd.DataFrame({'A':[np.nan,1,1,np.nan],'B':[2,np.nan,2,2]},index=[1,1,2,2])

df.columns=['A','A'] #dup column names
df.groupby(level=0).fillna(method='ffill')

Output:
    A   A
1   NaN 2.0
1   1.0 2.0
2   1.0 2.0
2   1.0 2.0
like image 98
fpersyn Avatar answered Nov 05 '22 07:11

fpersyn