Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas still getting SettingWithCopyWarning even after using .loc

Tags:

At first, I tried writing some code that looked like this:

import numpy as np import pandas as pd np.random.seed(2016) train = pd.DataFrame(np.random.choice([np.nan, 1, 2], size=(10, 3)),                       columns=['Age', 'SibSp', 'Parch'])  complete = train.dropna()     complete['AgeGt15'] = complete['Age'] > 15 

After getting SettingWithCopyWarning, I tried using.loc:

complete.loc[:, 'AgeGt15'] = complete['Age'] > 15 complete.loc[:, 'WithFamily'] = complete['SibSp'] + complete['Parch'] > 0 

However, I still get the same warning. What gives?

like image 735
Huey Avatar asked Aug 07 '16 00:08

Huey


People also ask

How do you stop SettingWithCopyWarning in Pandas?

Generally, to avoid a SettingWithCopyWarning in Pandas, you should do the following: Avoid chained assignments that combine two or more indexing operations like df["z"][mask] = 0 and df. loc[mask]["z"] = 0 . Apply single assignments with just one indexing operation like df.

How do you ignore SettingWithCopyWarning?

One approach that can be used to suppress SettingWithCopyWarning is to perform the chained operations into just a single loc operation. This will ensure that the assignment happens on the original DataFrame instead of a copy. Therefore, if we attempt doing so the warning should no longer be raised.

What is setting with copy warning?

This is what the warning is telling us. 'A value is trying to be set on a copy of a slice of a dataframe'. We discussed above that Pandas can either create a view or a copy when we are trying to access (get) a subset of an operation.

What does Loc do in Python?

The loc() function helps us to retrieve data values from a dataset at an ease. Using the loc() function, we can access the data values fitted in the particular row or column based on the index value passed to the function.


1 Answers

Note: As of pandas version 0.24, is_copy is deprecated and will be removed in a future version. While the private attribute _is_copy exists, the underscore indicates this attribute is not part of the public API and therefore should not be depended upon. Therefore, going forward, it seems the only proper way to silence SettingWithCopyWarning will be to do so globally:

pd.options.mode.chained_assignment = None 

When complete = train.dropna() is executed, dropna might return a copy, so out of an abundance of caution, Pandas sets complete.is_copy to a Truthy value:

In [220]: complete.is_copy Out[220]: <weakref at 0x7f7f0b295b38; to 'DataFrame' at 0x7f7eee6fe668> 

This allows Pandas to warn you later, when complete['AgeGt15'] = complete['Age'] > 15 is executed that you may be modifying a copy which will have no effect on train. For beginners this may be a useful warning. In your case, it appears you have no intention of modifying train indirectly by modifying complete. Therefore the warning is just a meaningless annoyance in your case.

You can silence the warning by setting,

complete.is_copy = False       # deprecated as of version 0.24 

This is quicker than making an actual copy, and nips the SettingWithCopyWarning in the bud (at the point where _check_setitem_copy is called):

def _check_setitem_copy(self, stacklevel=4, t='setting', force=False):     if force or self.is_copy:         ... 

If you are really confident you know what you are doing, you can shut off the SettingWithCopyWarning globally with

pd.options.mode.chained_assignment = None # None|'warn'|'raise' 

An alternative way to silence the warning is to make a new copy:

complete = complete.copy() 

However, you may not want to do this if the DataFrame is large, since copying can take a significant amount of time and memory, and it is completely pointless (except for the sake of silencing a warning) if you know complete is already a copy.

like image 165
unutbu Avatar answered Sep 18 '22 22:09

unutbu