Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: SettingWithCopyWarning [duplicate]

I'd like to replace values in a Pandas DataFrame larger than an arbitrary number (100 in this case) with NaN (as values this large are indicative of a failed experiment). Previously I've used this to replace unwanted values:

sve2_all[sve2_all[' Hgtot ng/l'] > 100] = np.nan

However, I got the following error:

-c:3: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
C:\Users\AppData\Local\Enthought\Canopy32\User\lib\site-packages\pandas\core\indexing.py:346: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
self.obj[item] = s

From this StackExchange question, it seems that sometimes this warning can be ignored, but I can't follow the discussion well enough to be certain whether this applies to my situation. Is the warning basically letting me know that I'll be overwriting some of the values in my DataFrame?

Edit: As far as I can tell, everything behaved as it should. As a follow up is my method of replacing values non-standard? Is there a better way to replace values?

like image 268
Jason Avatar asked Apr 11 '14 03:04

Jason


People also ask

What is setting with copy warning pandas?

This is what the warning is telling us. 'A value is trying to be set on a copy of a slice of a dataframe'. We discussed above that Pandas can either create a view or a copy when we are trying to access (get) a subset of an operation. Let's see if the operation we are trying to perform is on a view or a copy.

What does Copy () do in pandas?

Pandas DataFrame copy() Method The copy() method returns a copy of the DataFrame. By default, the copy is a "deep copy" meaning that any changes made in the original DataFrame will NOT be reflected in the copy.

Can pandas have duplicate column names?

Index objects are not required to be unique; you can have duplicate row or column labels.


3 Answers

As suggested in the error message, you should use loc to do this:

sve2_all.loc[sve2_all['Hgtot ng/l'] > 100] = np.nan

The warning is here to stop you modifying a copy (here sve2_all[sve2_all[' Hgtot ng/l'] > 100] is potentially a copy, and if it is then any modifications would not change the original frame. It could be that it works correctly in some cases but pandas cannot guarantee it will work in all cases... use at your own risk (consider yourself warned! ;) ).

like image 103
Andy Hayden Avatar answered Oct 04 '22 16:10

Andy Hayden


I was getting this warning while trying to reset the contents of an entire DataFrame but couldn't resolve it using loc or iloc:

df.loc[:, :] = new_values # SettingWithCopyWarning
df.iloc[:, :] = new_values # SettingWithCopyWarning

But resolving to the ndarray contained as data solved the problem:

df.values[:, :] = new_values # no warnings and desired behavior
like image 43
Marshall Farrier Avatar answered Oct 04 '22 15:10

Marshall Farrier


---Problem solved for me---

I had that warring error when i tried to convert float --> int even if i used the ".loc" command. my mistake was that i filtered my dataFrame (with masks) before the operation so the conversion occurred in only a small part of the dataframe item/column, the result was a mixed type column wich create a confuison. i solved the problem by converting the data frame before the masks (data filtration), i hope it will help.

like image 26
ilyes Avatar answered Oct 04 '22 16:10

ilyes