Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SettingWithCopyWarning when one column of DataFrame is strings

I get a SettingWithCopyWarning for the following code:

rain = DataFrame({'data':['1','2','3','4'],
                  'value':[1,-1,1,1]})
rain.value[rain.value < 0] = 0

While I don't get that warning with

rain = DataFrame({'data':[1,2,3,4],
              'value':[1,-1,1,1]})
rain.value[rain.value < 0] = 0

The only difference is that the 'data' column is strings in the first DataFrame, and numbers in the second DataFrame. Am I doing something wrong? Is there a different (preferred?) way to do this? Shouldn't the warning at least be applied consistently?

like image 378
Mark Bakker Avatar asked Sep 23 '15 14:09

Mark Bakker


2 Answers

You are doing something wrong on both occasions. The fact you receive a warning in one of the two scenarios is not relevant. You should never use chained indexing. In fact, it is explicitly discouraged in the docs.

Instead, you can use pd.DataFrame.loc:

rain.loc[rain.value < 0, 'value'] = 0

I see no warnings or errors in either scenario with this method. An even better idea, to avoid expensive Boolean indexing, is to use np.maximum:

rain['value'] = np.maximum(0, rain['value'])
like image 115
jpp Avatar answered Oct 12 '22 23:10

jpp


In the case of this question:

rain.value[rain.value < 0] = 0  # doesn't work

rain.loc[rain.value < 0] = 0  # works

Why Does One Work and Not the Other:

From the pandas documentation at Indexing and Selecting Data - Section Evaluation order Matters

A chained assignment can also crop up in setting in a mixed dtype frame.

Note These setting rules apply to all of .loc/.iloc.

This is the correct access method:

In [345]: dfc = pd.DataFrame({'A':['aaa','bbb','ccc'],'B':[1,2,3]})

In [346]: dfc.loc[0,'A'] = 11

In [347]: dfc
Out[347]: 
     A  B
0   11  1
1  bbb  2
2  ccc  3

This can work at times, but it is not guaranteed to, and therefore should be avoided:

In [348]: dfc = dfc.copy()

In [349]: dfc['A'][0] = 111

In [350]: dfc
Out[350]: 
     A  B
0  111  1
1  bbb  2
2  ccc  3

This will not work at all, and so should be avoided:

>>> pd.set_option('mode.chained_assignment','raise')
>>> dfc.loc[0]['A'] = 1111
Traceback (most recent call last)
     ...
SettingWithCopyException:
     A value is trying to be set on a copy of a slice from a DataFrame.
     Try using .loc[row_index,col_indexer] = value instead

Warning The chained assignment warnings / exceptions are aiming to inform the user of a possibly invalid assignment. There may be false positives; situations where a chained assignment is inadvertently reported.

like image 39
Trenton McKinney Avatar answered Oct 12 '22 22:10

Trenton McKinney