We have a dataframe with three different columns, like shown in the example above (df). The goal of this task is to replace the first element of the column 2 by a np.nan, everytime the letter in the column 1 changes. Since the database under study is very big, it cannot be used a for loop. Also every solution that involves a shift is excluded because it is too slow.
I believe the easiest way is to use the groupby and the head method, however I don't know how to replace in the original dataframe.
Examples:
df = pd.DataFrame([['A','Z',1.11],['B','Z',2.1],['C','Z',3.1],['D', 'X', 2.1], ['E','X',4.3],['E', 'X', 2.1], ['F','X',4.3]])

to select the elements that we want to change, we can do the following:
df.groupby(by=1).head(1)[2] = np.nan
However in the original dataframe nothing changes.
The goal is to obtain the following:

Based on comments, we won't df[1] returning to a group already seen, e.g. ['Z', 'Z', 'X', 'Z'] is not possible.
mask and shift
df[2] = df[2].mask(df[1].ne(df[1].shift(1)))
masked_array:df[2] = np.ma.masked_array(df[2], df[1].ne(df[1].shift(1))).filled(np.nan)
# array([nan, 2.1, 3.1, nan, 4.3, 2.1, 4.3])
np.roll and loc:a = df[1].values
df.loc[np.roll(a, 1)!=a, 2] = np.nan
   0  1    2
0  A  Z  NaN
1  B  Z  2.1
2  C  Z  3.1
3  D  X  NaN
4  E  X  4.3
5  E  X  2.1
6  F  X  4.3
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With