Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

perform operation opposite to pandas ffill

Let's say I have the following DataFrame:

df = pd.DataFrame({'player': ['LBJ', 'LBJ', 'LBJ', 'Kyrie', 'Kyrie', 'LBJ', 'LBJ'],
                   'points': [25, 32, 26, 21, 29, 21, 35]})

How can I perform the operation opposite of ffill so I can get the following DataFrame:

df = pd.DataFrame({'player': ['LBJ', np.nan, np.nan, 'Kyrie', np.nan, 'LBJ', np.nan],
                   'points': [25, 32, 26, 21, 29, 21, 35]})

That is, I want to fill directly repeated values with NaN.

Here's what I have so far but I'm hoping there's a built-in pandas method or a better approach:

for i, (index, row) in enumerate(df.iterrows()):
    if i == 0:
        continue
    go_back = 1
    while True:
        past_player = df.ix[i-go_back, 'player']
        if pd.isnull(past_player):
            go_back += 1
            continue
        if row['player'] == past_player:
            df.set_value(index, 'player', value=np.nan)
        break
like image 478
Johnny Metz Avatar asked Sep 28 '17 22:09

Johnny Metz


People also ask

What does method =' Ffill do?

Pandas DataFrame ffill() Method The ffill() method replaces the NULL values with the value from the previous row (or previous column, if the axis parameter is set to 'columns' ).

What does Ffill do in pandas?

ffill() function is used to fill the missing value in the dataframe. 'ffill' stands for 'forward fill' and will propagate last valid observation forward.

What is Ffill and bfill?

method='ffill': Ffill or forward-fill propagates the last observed non-null value forward until another non-null value is encountered. method='bfill': Bfill or backward-fill propagates the first observed non-null value backward until another non-null value is met.

How do I subtract two pandas from a DataFrame?

subtract() function is used for finding the subtraction of dataframe and other, element-wise. This function is essentially same as doing dataframe – other but with a support to substitute for missing data in one of the inputs.


2 Answers

ffinv = lambda s: s.mask(s == s.shift())
df.assign(player=ffinv(df.player))

  player  points
0    LBJ      25
1    NaN      32
2    NaN      26
3  Kyrie      21
4    NaN      29
5    LBJ      21
6    NaN      35
like image 176
piRSquared Avatar answered Oct 19 '22 12:10

piRSquared


Probably not the most efficient solution but working would be to use itertools.groupby and itertools.chain:

>>> df['player'] = list(itertools.chain.from_iterable([key] + [float('nan')]*(len(list(val))-1) 
                        for key, val in itertools.groupby(df['player'].tolist())))
>>> df
  player  points
0    LBJ      25
1    NaN      32
2    NaN      26
3  Kyrie      21
4    NaN      29
5    LBJ      21
6    NaN      35

More specifically this illustrates how it works:

for key, val in itertools.groupby(df['player']):
    print([key] + [float('nan')]*(len(list(val))-1))

giving:

['LBJ', nan, nan]
['Kyrie', nan]
['LBJ', nan]

which is then "chained" together.

like image 37
MSeifert Avatar answered Oct 19 '22 12:10

MSeifert