Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas: operations using groupby yield SettingWithCopyWarning

Tags:

python

pandas

Let's say I have the following pandas DataFrame:

df = pd.DataFrame({
    'team': ['Warriors', 'Warriors', 'Warriors', 'Rockets', 'Rockets'],
    'player': ['Stephen Curry', 'Klay Thompson', 'Kevin Durant', 'Chris Paul', 'James Harden']})

When I try to group on the team column and perform an operation I get a SettingWithCopyWarning:

for team, team_df in df.groupby(by='team'):
    # team_df = team_df.copy()  # produces no warning
    team_df['rank'] = 10  # produces warning
    team_df.loc[:, 'rank'] = 10  # produces warning

SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
df_team['rank'] = 10

If I uncomment the line generating a copy of the sub-DataFrame, I don't get the error. Is this generally best practice to avoid this warning or am I doing something wrong?

Note I don't want to edit the original DataFrame df. Also I know this example can be done a better way but my use case is much more complex and requires grouping an original DataFrame and performing a series of operations based on a different DataFrame and the specs of that unique group.

like image 841
Johnny Metz Avatar asked Jul 14 '17 18:07

Johnny Metz


1 Answers

Once you grok this article and are confident you know how to avoid chained indexing (through use of .loc or iloc) then you can turn off the SettingWithCopyWarning with pd.options.mode.chained_assignment = None and never be bothered by this warning ever again.

Since you wrote

Note I don't want to edit the original DataFrame df

and you are properly using .loc to assign to team_df, it is clear you already know that modifying the copy (team_df) will not modify the original (df), so the SettingWithCopyWarning emitted here is just a nuisance.

The SettingWithCopyWarning comes up in all sorts of situations where you are coding properly, even with .loc or .iloc. There is no "proper" way to code which avoids sometimes triggering SettingWithCopyWarnings.

Therefore, I would just turn off this warning globally with

pd.options.mode.chained_assignment = None

I would generally not recommend using team_df = team_df.copy() just to avoid SettingWithCopyWarnings -- copying a dataframe can be a drain on performance especially when the dataframe is large or if done many times in a loop.

If you want to turn off the warning in just one location, you could use

team_df.is_copy = False

It serves the same purpose but will not be a performance drain. Note, however, that is_copy is not mentioned in the official Pandas API, so it may not be guaranteed to exist or be useful for this purpose in all future versions of Pandas. So if robustness is a priority but performance isn't then maybe use team_df = team_df.copy(). But I think the sounder way for an experienced Pandas programmer to go is to either turn the warning off globally or -- if you want to be very careful -- keep the warnings, check them manually, but accept that it will sometimes be triggered by correct code.

like image 100
unutbu Avatar answered Oct 09 '22 02:10

unutbu