Suppose I have four columns A, B, C, D in a data frame df
:
import pandas as pd
df = pd.read_csv('results.csv')
df
A B C D
good good good good
good bad good good
good bad bad good
bad good good good
I want to add an other column result
. The variables in it should be based on the corresponding rows' variables. Here, in my case, if there are at least three goods in the corresponding row i.e. in the columns A, B, C, D then the variable in results should be valid
otherwise notvalid
.
Expected output:
A B C D results
good good good good valid
good bad good good valid
good bad bad good notvalid
bad good good good valid
Select Rows by Name in Pandas DataFrame using locThe . loc[] function selects the data by labels of rows or columns. It can select a subset of rows and columns.
You can use the df. loc[[2]] to print a specific row of a pandas dataframe.
You can use:
# columns of interest:
cols = ['A','B','C','D']
df['results'] = np.where(df[cols].eq('good').sum(1).ge(3),
'valid', 'invalid')
Output:
A B C D results
0 good good good good valid
1 good bad good good valid
2 good bad bad good invalid
3 bad good good good valid
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With