change column in pre-selected elements in pandas dataframe

Question

We have a dataframe with three different columns, like shown in the example above (df). The goal of this task is to replace the first element of the column 2 by a np.nan, everytime the letter in the column 1 changes. Since the database under study is very big, it cannot be used a for loop. Also every solution that involves a shift is excluded because it is too slow.

I believe the easiest way is to use the groupby and the head method, however I don't know how to replace in the original dataframe.

Examples:

df = pd.DataFrame([['A','Z',1.11],['B','Z',2.1],['C','Z',3.1],['D', 'X', 2.1], ['E','X',4.3],['E', 'X', 2.1], ['F','X',4.3]])

enter image description here

to select the elements that we want to change, we can do the following:

df.groupby(by=1).head(1)[2] = np.nan

However in the original dataframe nothing changes.
The goal is to obtain the following:

enter image description here

Edit:

Based on comments, we won't df[1] returning to a group already seen, e.g. ['Z', 'Z', 'X', 'Z'] is not possible.

user3483203 · Accepted Answer

Using `mask` and `shift`

df[2] = df[2].mask(df[1].ne(df[1].shift(1)))

Using a `masked_array`:

df[2] = np.ma.masked_array(df[2], df[1].ne(df[1].shift(1))).filled(np.nan)
# array([nan, 2.1, 3.1, nan, 4.3, 2.1, 4.3])

Using `np.roll` and `loc`:

a = df[1].values
df.loc[np.roll(a, 1)!=a, 2] = np.nan

   0  1    2
0  A  Z  NaN
1  B  Z  2.1
2  C  Z  3.1
3  D  X  NaN
4  E  X  4.3
5  E  X  2.1
6  F  X  4.3

change column in pre-selected elements in pandas dataframe

Tags:

python

pandas

dataframe

Edit:

nunodsousa

1 Answers

Using `mask` and `shift`

Using a `masked_array`:

Using `np.roll` and `loc`:

user3483203

Recent Activity

Donate For Us

change column in pre-selected elements in pandas dataframe

Tags:

python

pandas

dataframe

Edit:

nunodsousa

1 Answers

Using mask and shift

Using a masked_array:

Using np.roll and loc:

user3483203

Related questions

Recent Activity

Donate For Us

Using `mask` and `shift`

Using a `masked_array`:

Using `np.roll` and `loc`: