Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

change column in pre-selected elements in pandas dataframe

We have a dataframe with three different columns, like shown in the example above (df). The goal of this task is to replace the first element of the column 2 by a np.nan, everytime the letter in the column 1 changes. Since the database under study is very big, it cannot be used a for loop. Also every solution that involves a shift is excluded because it is too slow.

I believe the easiest way is to use the groupby and the head method, however I don't know how to replace in the original dataframe.

Examples:

df = pd.DataFrame([['A','Z',1.11],['B','Z',2.1],['C','Z',3.1],['D', 'X', 2.1], ['E','X',4.3],['E', 'X', 2.1], ['F','X',4.3]])

enter image description here

to select the elements that we want to change, we can do the following:

df.groupby(by=1).head(1)[2] = np.nan

However in the original dataframe nothing changes.
The goal is to obtain the following:

enter image description here

Edit:

Based on comments, we won't df[1] returning to a group already seen, e.g. ['Z', 'Z', 'X', 'Z'] is not possible.

like image 369
nunodsousa Avatar asked Nov 30 '22 08:11

nunodsousa


1 Answers

Using mask and shift

df[2] = df[2].mask(df[1].ne(df[1].shift(1)))

Using a masked_array:

df[2] = np.ma.masked_array(df[2], df[1].ne(df[1].shift(1))).filled(np.nan)
# array([nan, 2.1, 3.1, nan, 4.3, 2.1, 4.3])

Using np.roll and loc:

a = df[1].values
df.loc[np.roll(a, 1)!=a, 2] = np.nan

   0  1    2
0  A  Z  NaN
1  B  Z  2.1
2  C  Z  3.1
3  D  X  NaN
4  E  X  4.3
5  E  X  2.1
6  F  X  4.3
like image 126
user3483203 Avatar answered Dec 05 '22 04:12

user3483203