I have a DataFrame as below
df=pd.DataFrame({'A':[np.nan,1,1,np.nan],'B':[2,np.nan,2,2]},index=[1,1,2,2])
df.columns=['A','A']
Now I want to ffill
the values groupby
the index
, first I try
df.groupby(level=0).ffill()
Which returns the error code
> ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
It looks like a bug, then I am trying with apply, which returns the expected output.
df.groupby(level=0).apply(lambda x : x.ffill())
A A
1 NaN 2.0
1 1.0 2.0
2 1.0 2.0
2 1.0 2.0
For your reference when the columns is unique , it works just(Q2) fine, however, create one index columns and columns name is NaN
df.columns=['C','D']
df.groupby(level=0).ffill()
NaN C D
1 1 NaN 2.0
1 1 1.0 2.0
2 2 1.0 2.0
2 2 1.0 2.0
Question :
1 Is this a bug ? why apply can still work with this type situation ?2 why
groupby
withindex
andffill
, it creates the additional columns ?
It sure looks bugged. Just wanted to note that according to the pandas documentation the .ffill()
method is a synonym for .fillna(method='ffill')
. Using the latter generates your expected output for both your examples in pandas version 0.23.4
without any errors or additional columns. Hope that helps.
import pandas as pd
import numpy as np
df=pd.DataFrame({'A':[np.nan,1,1,np.nan],'B':[2,np.nan,2,2]},index=[1,1,2,2])
df.columns=['A','A'] #dup column names
df.groupby(level=0).fillna(method='ffill')
Output:
A A
1 NaN 2.0
1 1.0 2.0
2 1.0 2.0
2 1.0 2.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With