I have a dataframe (df) containing several columns with an actual measure and corresponding number of columns (A,B,...) with an uncertainty (dA, dB, ...) for each of these columns:
A B dA dB
0 -1 3 0.31 0.08
1 2 -4 0.263 0.357
2 5 5 0.382 0.397
3 -4 -0.5 0.33 0.115
I apply a function to find values in the measurement columns that are valid according to my definition
df[["A","B"]].apply(lambda x: x.abs()-5*df['d'+x.name] > 0)
This will return a boolean array:
A B
0 False True
1 True True
2 True True
3 True False
I would like to use this array to select rows in dataframe for which the condition is true within a single column, e.g. A -> row 1-3, and also find rows where the condition is true for all the input columns, e.g. row 1 and 2. Is there an efficient way to do this with pandas?
You can use the results of your apply statement to boolean index select from the original dataframe:
results = df[["A","B"]].apply(lambda x: x.abs()-5*df['d'+x.name] > 0)
Which returns your boolean array above:
A B
0 False True
1 True True
2 True True
3 True False
Now, you can use this array to select rows from your original datafame as follows:
Select where A is True:
df[results.A]
A B dA dB
1 2 -4.0 0.263 0.357
2 5 5.0 0.382 0.397
3 -4 -0.5 0.330 0.115
Select where either A or B are true:
df[results.any(axis=1)]
A B dA dB
0 -1 3.0 0.310 0.080
1 2 -4.0 0.263 0.357
2 5 5.0 0.382 0.397
3 -4 -0.5 0.330 0.115
Select where all the columns true:
df[results.all(axis=1)]
A B dA dB
1 2 -4.0 0.263 0.357
2 5 5.0 0.382 0.397
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With