Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SettingwithcopyWarning: while try to use .sort_values in pandas

I am trying to sort a dataframe by total column:

df.sort_values(by='Total', ascending=False, axis=0, inplace =True)

But I'm getting the following warning:

/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.

When I followed the link it opens up and using .loc methods is suggested. But after that I followed the .sort_values() where I find out to use inplace = False or None.

My question is what if I got a dataframe columns which is not sorted, and if I don't use inplace = True, my dataframe will be sorted for further use or I have to assigned a new name to the dataframe and saved it.

like image 822
Ayan Chowdhury Avatar asked Jan 20 '20 10:01

Ayan Chowdhury


1 Answers

The warning isn't clear, but if you use .copy() combined with .loc when you create df by filtering another df then the warning should go away.

import pandas as pd

df = pd.DataFrame({'num':range(10),'Total':range(20,30)})
# loc without copy
df_2 = df.loc[df.num <5]

df_2.sort_values(by='Total', ascending=False, axis=0, inplace =True)
# leads to SettingWithCopyWarning

df_3 = df.loc[df.num <5].copy()
df_3.sort_values(by='Total', ascending=False, axis=0, inplace =True)
# no warning

You will find some more details here but there is a really annoying class of Pandas bugs that the setting with copy warning is trying to protect you from.

df_4 = df.copy()
df_4['new_col'] = df_4.num *2
df_5 = df
df_5['new_col_2'] = df_5.num *3 

# df_5's column is also added to df, but not df_4, because of .copy()
df.columns
#Index(['num', 'Total', 'new_col_2'], dtype='object')

df[df.num <2].loc[:,['Total']] = 100
df.Total.max()
# still 29, because of the chained .locs, Total was not updated.
df.loc[df.num<2,'Total'] = 100
df.Total.max()
# 100
like image 133
oli5679 Avatar answered Oct 04 '22 01:10

oli5679