Suppose I have four columns A, B, C, D in a data frame df:
import pandas as pd
df = pd.read_csv('results.csv')
df 
A     B     C     D
good  good  good  good
good  bad   good  good
good  bad   bad   good
bad   good  good  good
I want to add an other column result. The variables in it should be based on the corresponding rows' variables. Here, in my case, if there are at least three goods in the corresponding row i.e. in the columns A, B, C, D then the variable in results should be valid otherwise notvalid. 
Expected output:
A     B     C     D     results
good  good  good  good  valid
good  bad   good  good  valid
good  bad   bad   good  notvalid
bad   good  good  good  valid
Select Rows by Name in Pandas DataFrame using locThe . loc[] function selects the data by labels of rows or columns. It can select a subset of rows and columns.
You can use the df. loc[[2]] to print a specific row of a pandas dataframe.
You can use:
# columns of interest:
cols = ['A','B','C','D']
df['results'] = np.where(df[cols].eq('good').sum(1).ge(3), 
                         'valid', 'invalid')
Output:
      A     B     C     D  results
0  good  good  good  good    valid
1  good   bad  good  good    valid
2  good   bad   bad  good  invalid
3   bad  good  good  good    valid
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With