Let's say I have the following pandas DataFrame:
df = pd.DataFrame({
'team': ['Warriors', 'Warriors', 'Warriors', 'Rockets', 'Rockets'],
'player': ['Stephen Curry', 'Klay Thompson', 'Kevin Durant', 'Chris Paul', 'James Harden']})
When I try to group on the team
column and perform an operation I get a SettingWithCopyWarning
:
for team, team_df in df.groupby(by='team'):
# team_df = team_df.copy() # produces no warning
team_df['rank'] = 10 # produces warning
team_df.loc[:, 'rank'] = 10 # produces warning
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
df_team['rank'] = 10
If I uncomment the line generating a copy of the sub-DataFrame, I don't get the error. Is this generally best practice to avoid this warning or am I doing something wrong?
Note I don't want to edit the original DataFrame df
. Also I know this example can be done a better way but my use case is much more complex and requires grouping an original DataFrame and performing a series of operations based on a different DataFrame and the specs of that unique group.
Once you grok this article and are
confident you know how to avoid chained indexing (through use of .loc
or
iloc
) then you can turn off the SettingWithCopyWarning
with
pd.options.mode.chained_assignment = None
and never be bothered by this warning ever again.
Since you wrote
Note I don't want to edit the original DataFrame df
and you are properly using .loc
to assign to team_df
, it is clear you
already know that modifying the copy (team_df
) will not modify the original
(df
), so the SettingWithCopyWarning
emitted here is just a nuisance.
The SettingWithCopyWarning
comes up in all sorts of situations where you are
coding properly, even with .loc
or .iloc
. There is no "proper" way to code
which avoids sometimes triggering SettingWithCopyWarning
s.
Therefore, I would just turn off this warning globally with
pd.options.mode.chained_assignment = None
I would generally not recommend using team_df = team_df.copy()
just to avoid
SettingWithCopyWarning
s -- copying a dataframe can be a drain on
performance especially when the dataframe is large or if done many times in a loop.
If you want to turn off the warning in just one location, you could use
team_df.is_copy = False
It serves the same purpose but will not be a performance drain. Note, however,
that is_copy
is not mentioned in the official Pandas API, so it may not be
guaranteed to exist or be useful for this purpose in all future versions of
Pandas. So if robustness is a priority but performance isn't then maybe use
team_df = team_df.copy()
. But I think the sounder way for an experienced
Pandas programmer to go is to either turn the warning off globally or -- if you
want to be very careful -- keep the warnings, check them manually, but accept
that it will sometimes be triggered by correct code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With