python: pandas np.where vs. df.loc with multiple conditions

Tags:

Np.where has been giving me a lot of errors, so I am looking for a solution with df.loc instead.

This is the np.where error I have been getting:

C:\Users\xxx\AppData\Local\Continuum\Anaconda2\lib\site-packages\ipykernel\__main__.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':

I am working with the following dataframe df:

df = pd.DataFrame({'Column_A': ['AAA','AAA','ABC','CDE'],'checked': ['0','0','1','0'],'duplicate': ['True','True','False','False']})

    Column_A    checked   duplicate
0   AAA             0      True
1   AAA             0      True
2   ABC             1      False
3   CDE             0      False

I want to create an additional flag, if checked is 0 and duplicate is True.

I tried this and it didn't work:

df['flag'] = (np.where((df['checked'] == 'Y') &(df['duplicate'] == 'True'), 'Y', '0'))

TypeError: invalid type comparison

I tried it with df.loc:

df['flag'] = (df.loc[df['checked'] == 'Y']& df.loc[df['duplicate'] == 'True'], 'Y','0')

TypeError: invalid type comparison

and I get the same error!

854

asked Jun 15 '17 14:06

jeangelj

1 Answers

I think your boolean are not strings, so need remove ':

df = pd.DataFrame({'Column_A': ['AAA','AAA','ABC','CDE'],
                  'checked': ['0','0','1','0'],
                  'duplicate': [True, True, False, False]})

df['flag'] = np.where((df['checked'] == 'Y') &(df['duplicate'] == True), 'Y', '0')
print (df)
  Column_A checked  duplicate flag
0      AAA       0       True    0
1      AAA       0       True    0
2      ABC       1      False    0
3      CDE       0      False    0

Or if compare with boolean column, == True can be omited:

df['flag'] = np.where((df['checked'] == 'Y') &(df['duplicate']), 'Y', '0')
print (df)
  Column_A checked  duplicate flag
0      AAA       0       True    0
1      AAA       0       True    0
2      ABC       1      False    0
3      CDE       0      False    0

Also if need check checked need ' because strings:

df['flag'] = np.where((df['checked'] == '0') &(df['duplicate'] == True), 'Y', '0')
print (df)
  Column_A checked  duplicate flag
0      AAA       0       True    Y
1      AAA       0       True    Y
2      ABC       1      False    0
3      CDE       0      False    0

EDIT:

Solution with loc:

df['flag'] = '0'
mask = (df['checked'] == '0') &(df['duplicate'])
df.loc[mask, 'flag'] = 'Y'
print (df)
  Column_A checked  duplicate flag
0      AAA       0       True    Y
1      AAA       0       True    Y
2      ABC       1      False    0
3      CDE       0      False    0

120

answered Sep 28 '22 18:09

jezrael

Related questions
                            
                                What's the pythonic way to pass arguments between functions?
                            
                                "Apps aren't loaded yet" and "django.core.exceptions.ImproperlyConfigured" in Django?
                            
                                How to get value (not key) data from SelectField in WTForms [duplicate]
                            
                                Debug python dump file in windbg
                            
                                Tensorflow: How to index a tensor using 2D-index like in numpy
                            
                                How to get weights from tensorflow fully_connected
                            
                                Pandas multiindex dataframe - Selecting max from one index within multiindex
                            
                                Error installing NLTK Python
                            
                                find out all child elements xpath from parent xpath using selenium webdriver in python
                            
                                PyCharm Python Console - Printing on the same line not working as intended
                            
                                Find index where elements change value pandas dataframe
                            
                                attach img file in pdf weasyprint
                            
                                pytorch Network.parameters() missing 1 required positional argument: 'self'
                            
                                How to create a grouped bar chart in Altair?
                            
                                Where is the luigi config file?
                            
                                Setting both axes logarithmic in bar plot matploblib
                            
                                Why does insert script using cx_Oracle hangs
                            
                                How do I increase decimal precision in Spark?
                            
                                error with snappy while importing fastparquet in python
                            
                                How to set default_app_config for Django with apps directory structure?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

python: pandas np.where vs. df.loc with multiple conditions

Tags:

python

pandas

numpy

typeerror

jeangelj

People also ask

1 Answers

jezrael

Recent Activity

Donate For Us