Pandas warning when using map: A value is trying to be set on a copy of a slice from a DataFrame




I've got the following code and it works. This basically renames values in columns so that they can be later merged.

pop = pd.read_csv('population.csv')
pop_recent = pop[pop['Year'] == 2014]

mapping = {
        'Korea, Rep.': 'South Korea',
        'Taiwan, China': 'Taiwan'
f= lambda x: mapping.get(x, x)
pop_recent['Country Name'] = pop_recent['Country Name'].map(f)

Warning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy pop_recent['Country Name'] = pop_recent['Country Name'].map(f)

I did google this! But no examples seem to be using map, so I'm at a loss...

The issue is with chained indexing , what you are actually trying to do is to set values to - pop[pop['Year'] == 2014]['Country Name'] - this would not work most of the times (as explained very well in the linked documentation) as this is two different calls and one of the calls may return a copy of the dataframe (I believe the boolean indexing) is returning the copy of the dataframe).

Hence, when you try to set values to that copy, it does not reflect in the original dataframe. Example -

In [6]: df
   A  B
0  1  2
1  3  4
2  4  5
3  6  7
4  8  9

In [7]: df[df['A']==1]['B'] = 10
/path/to/ipython-script.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':

In [8]: df
   A  B
0  1  2
1  3  4
2  4  5
3  6  7
4  8  9

As noted , instead of chained indexing you should use DataFrame.loc to index the rows as well as the columns to update in a single call, avoiding this error. Example -

pop.loc[(pop['year'] == 2014), 'Country Name'] = pop.loc[(pop['year'] == 2014), 'Country Name'].map(f)

Or if this seem too long to you, you can create a mask (boolean dataframe) beforehand and assign to a variable, and use that in the above statement. Example -

mask = pop['year'] == 2014
pop.loc[mask,'Country Name'] = pop.loc[mask,'Country Name'].map(f)

Demo -

In [9]: df
   A  B
0  1  2
1  3  4
2  4  5
3  6  7
4  8  9

In [10]: mapping = { 1:2 , 3:4}

In [11]: f= lambda x: mapping.get(x, x)

In [12]: df.loc[(df['B']==2),'A'] = df.loc[(df['B']==2),'A'].map(f)

In [13]: df
   A  B
0  2  2
1  3  4
2  4  5
3  6  7
4  8  9

Demo with the mask method -

In [18]: df
   A  B
0  1  2
1  3  4
2  4  5
3  6  7
4  8  9

In [19]: mask = df['B']==2

In [20]: df.loc[mask,'A'] = df.loc[mask,'A'].map(f)

In [21]: df
   A  B
0  2  2
1  3  4
2  4  5
3  6  7
4  8  9
