Here is an example dataframe:
X Y Z
1 0 1
0 1 0
1 1 1
Now, here is the rule I've come up with:
The final dataframe should look like this:
X Y Z
0 0 1
0 1 0
0 0 1
My first thought at a solution is this:
df_null_list = ['X']
for i in ['Y', 'Z']:
df[df[i] == 1][df_null_list] = 0
df_null_list.append(i)
When I do this and sum across the y axis, i'm starting to get values of 2 and 4 which don't make sense. Note, i'm referring to when I ran this on the actual dataset.
Do you have any suggestions for improvements or alternative solutions?
You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.
Pandas replace multiple values in column replace. By using DataFrame. replace() method we will replace multiple values with multiple new strings or text for an individual DataFrame column. This method searches the entire Pandas DataFrame and replaces every specified value.
Use mask
:
df['X'] = df['X'].mask(df.Y == 1, 0)
df[['X', 'Y']] = df[['X', 'Y']].mask(df.Z == 1, 0)
Another solution with DataFrame.loc
:
df.loc[df.Y == 1, 'X'] = 0
df.loc[df.Z == 1, ['X', 'Y']] = 0
print (df)
X Y Z
0 0 0 1
1 0 1 0
2 0 0 1
You can generalize this to wanting the last index of 1
per row to remain 1
, and leave everything else as 0
. For performance operate on the underlying numpy
array:
a = df.values
idx = (a.shape[1] - a[:, ::-1].argmax(1)) - 1
t = np.zeros(a.shape)
t[np.arange(a.shape[0]), idx] = 1
array([[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.]])
If you need the result back as a DataFrame:
pd.DataFrame(t, columns=df.columns, index=df.index).astype(int)
X Y Z
0 0 0 1
1 0 1 0
2 0 0 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With