I am trying to sort a dataframe by total column:
df.sort_values(by='Total', ascending=False, axis=0, inplace =True)
But I'm getting the following warning:
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
"""Entry point for launching an IPython kernel.
When I followed the link it opens up and using .loc
methods is suggested. But after that I followed the .sort_values() where I find out to use inplace = False or None.
My question is what if I got a dataframe columns which is not sorted, and if I don't use inplace = True, my dataframe will be sorted for further use or I have to assigned a new name to the dataframe and saved it.
The warning isn't clear, but if you use .copy() combined with .loc when you create df by filtering another df then the warning should go away.
import pandas as pd
df = pd.DataFrame({'num':range(10),'Total':range(20,30)})
# loc without copy
df_2 = df.loc[df.num <5]
df_2.sort_values(by='Total', ascending=False, axis=0, inplace =True)
# leads to SettingWithCopyWarning
df_3 = df.loc[df.num <5].copy()
df_3.sort_values(by='Total', ascending=False, axis=0, inplace =True)
# no warning
You will find some more details here but there is a really annoying class of Pandas bugs that the setting with copy warning is trying to protect you from.
df_4 = df.copy()
df_4['new_col'] = df_4.num *2
df_5 = df
df_5['new_col_2'] = df_5.num *3
# df_5's column is also added to df, but not df_4, because of .copy()
df.columns
#Index(['num', 'Total', 'new_col_2'], dtype='object')
df[df.num <2].loc[:,['Total']] = 100
df.Total.max()
# still 29, because of the chained .locs, Total was not updated.
df.loc[df.num<2,'Total'] = 100
df.Total.max()
# 100
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With