Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas SettingWithCopyWarning [duplicate]

Tags:

Python 3.4 and Pandas 0.15.0

df is a dataframe and col1 is a column. With the code below, I'm checking for the presence of the value 10 and replacing such values with 1000.

df.col1[df.col1 == 10] = 1000 

Here's another example. This time, I'm changing values in col2 based on index.

df.col2[df.index == 151] = 500 

Both these produce the warning below:

-c:1: SettingWithCopyWarning:  A value is trying to be set on a copy of a slice from a DataFrame  See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy 

Finally,

cols = ['col1', 'col2', 'col3'] df[cols] = df[cols].applymap(some_function) 

This produces a similar warning, with an added suggestion:

Try using .loc[row_indexer,col_indexer] = value instead 

I'm not sure I understand the discussion pointed to in the warnings. What would be a better way to write these three lines of code?

Note that the operations worked.

like image 571
ba_ul Avatar asked Nov 03 '14 22:11

ba_ul


People also ask

What is setting with copy warning?

Warnings should never be ignored. If you have ever done data analysis or manipulation with Pandas, it is highly likely that you encounter the SettingWithCopy warning at least once. This warning occurs when we try to do an assignment using chained indexing because chained indexing has inherently unpredictable results.

Can a Pandas DataFrame have duplicate column names?

Pandas, however, can be tricked into allowing duplicate column names. Duplicate column names are a problem if you plan to transfer your data set to another statistical language.

What's the difference between LOC and ILOC in Pandas?

When it comes to selecting rows and columns of a pandas DataFrame, loc and iloc are two commonly used functions. Here is the subtle difference between the two functions: loc selects rows and columns with specific labels. iloc selects rows and columns at specific integer positions.


2 Answers

The issue here is that: df.col1[df.col1 == 10] returns a copy.

So I would say:

row_index = df.col1 == 10 # then with the form .loc[row_indexer,col_indexer] df.loc[row_index, 'col1'] = 100 
like image 79
Paul H Avatar answered Sep 25 '22 13:09

Paul H


Agreed with Paul about 'loc' usage.

For your applymap case you should be able to do this:

cols = ['col1', 'col2', 'col3'] df.loc[:, cols] = df[cols].applymap(some_function) 
like image 29
koelemay Avatar answered Sep 25 '22 13:09

koelemay