Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - Replace other columns in row with 0 if a specific column has a value of 1

Tags:

python

pandas

Here is an example dataframe:

X Y Z 
1 0 1
0 1 0
1 1 1

Now, here is the rule I've come up with:

  • X is left as is
  • If Y is equal to 1 set the corresponding value in X to 0
  • If Z is equal to 1 set the corresponding value in X and Y to 0

The final dataframe should look like this:

X Y Z 
0 0 1
0 1 0
0 0 1

My first thought at a solution is this:

df_null_list = ['X']

for i in ['Y', 'Z']:

    df[df[i] == 1][df_null_list] = 0

    df_null_list.append(i)

When I do this and sum across the y axis, i'm starting to get values of 2 and 4 which don't make sense. Note, i'm referring to when I ran this on the actual dataset.

Do you have any suggestions for improvements or alternative solutions?

like image 349
madsthaks Avatar asked Nov 04 '18 16:11

madsthaks


People also ask

How do you replace values in a DataFrame based on a condition?

You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.

How replace multiple values of column with single value in pandas?

Pandas replace multiple values in column replace. By using DataFrame. replace() method we will replace multiple values with multiple new strings or text for an individual DataFrame column. This method searches the entire Pandas DataFrame and replaces every specified value.


2 Answers

Use mask:

df['X'] = df['X'].mask(df.Y == 1, 0)
df[['X', 'Y']] = df[['X', 'Y']].mask(df.Z == 1, 0)

Another solution with DataFrame.loc:

df.loc[df.Y == 1, 'X'] = 0
df.loc[df.Z == 1, ['X', 'Y']] = 0

print (df)
   X  Y  Z
0  0  0  1
1  0  1  0
2  0  0  1
like image 117
jezrael Avatar answered Oct 19 '22 22:10

jezrael


You can generalize this to wanting the last index of 1 per row to remain 1, and leave everything else as 0. For performance operate on the underlying numpy array:

a = df.values
idx = (a.shape[1] - a[:, ::-1].argmax(1)) - 1
t = np.zeros(a.shape)
t[np.arange(a.shape[0]), idx] = 1

array([[0., 0., 1.],
       [0., 1., 0.],
       [0., 0., 1.]])

If you need the result back as a DataFrame:

pd.DataFrame(t, columns=df.columns, index=df.index).astype(int)

   X  Y  Z
0  0  0  1
1  0  1  0
2  0  0  1
like image 20
user3483203 Avatar answered Oct 19 '22 21:10

user3483203