I want to delete rows when a few conditions are met:
For instance, a random DataFrame is generated:
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(10, 4), columns=['one', 'two', 'three', 'four']) print df
one instance of table is shown as below:
one two three four 0 -0.225730 -1.376075 0.187749 0.763307 1 0.031392 0.752496 -1.504769 -1.247581 2 -0.442992 -0.323782 -0.710859 -0.502574 3 -0.948055 -0.224910 -1.337001 3.328741 4 1.879985 -0.968238 1.229118 -1.044477 5 0.440025 -0.809856 -0.336522 0.787792 6 1.499040 0.195022 0.387194 0.952725 7 -0.923592 -1.394025 -0.623201 -0.738013 8 -1.775043 -1.279997 0.194206 -1.176260 9 -0.602815 1.183396 -2.712422 -0.377118
I want to delete rows based on the conditions that:
Row with value of col 'one', 'two', or 'three' greater than 0; and value of col 'four' less than 0 should be deleted.
Then I tried to implement as follows:
df = df[df.one > 0 or df.two > 0 or df.three > 0 and df.four < 1]
However, resulting in a error message as follow:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Could someone help me on how to delete based on multiple conditions?
Pandas provide data analysts a way to delete and filter data frame using dataframe. drop() method. We can use this method to drop such rows that do not satisfy the given conditions.
To remove rows of data from a dataframe based on multiple conditional statements. We use square brackets [ ] with the dataframe and put multiple conditional statements along with AND or OR operator inside it. This slices the dataframe and removes all the rows that do not satisfy the given conditions.
Use drop() method to delete rows based on column value in pandas DataFrame, as part of the data cleansing, you would be required to drop rows from the DataFrame when a column value matches with a static value or on another column value.
For reasons that aren't 100% clear to me, pandas
plays nice with the bitwise logical operators |
and &
, but not the boolean ones or
and and
.
Try this instead:
df = df[(df.one > 0) | (df.two > 0) | (df.three > 0) & (df.four < 1)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With