Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Confusion re: pandas copy of slice of dataframe warning

I've looked through a bunch of questions and answers related to this issue, but I'm still finding that I'm getting this copy of slice warning in places where I don't expect it. Also, it's cropping up in code that was running fine for me previously, leading me to wonder if some sort of update may be the culprit.

For example, this is a set of code where all I'm doing is reading in an Excel file into a pandas DataFrame, and cutting down the set of columns included with the df[[]] syntax.

 izmir = pd.read_excel(filepath)  izmir_lim = izmir[['Gender','Age','MC_OLD_M>=60','MC_OLD_F>=60','MC_OLD_M>18','MC_OLD_F>18','MC_OLD_18>M>5','MC_OLD_18>F>5',                'MC_OLD_M_Child<5','MC_OLD_F_Child<5','MC_OLD_M>0<=1','MC_OLD_F>0<=1','Date to Delivery','Date to insert','Date of Entery']] 

Now, any further changes I make to this izmir_lim file raise the copy of slice warning.

izmir_lim['Age'] = izmir_lim.Age.fillna(0) izmir_lim['Age'] = izmir_lim.Age.astype(int) 

/Users/samlilienfeld/anaconda/lib/python3.5/site-packages/ipykernel/main.py:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

I'm confused because I thought the df[[]] column subsetting returned a copy by default. The only way I've found to suppress the errors is by explicitly adding df[[]].copy(). I could have sworn that in the past I did not have to do that and did not raise the copy of slice error.

Similarly, I have some other code that runs a function on a dataframe to filter it in certain ways:

def lim(df): if (geography == "All"):     df_geo = df else:     df_geo = df[df.center_JO == geography]  df_date = df_geo[(df_geo.date_survey >= start_date) & (df_geo.date_survey <= end_date)]  return df_date  df_lim = lim(df) 

From this point forward, any changes I make to any of the values of df_lim raise the copy of slice error. The only way around it that i've found is to change the function call to:

df_lim = lim(df).copy() 

This just seems wrong to me. What am I missing? It seems like these use cases should return copies by default, and I could have sworn that the last time I ran these scripts I was not running in to these errors.
Do I just need to start adding .copy() all over the place? Seems like there should be a cleaner way to do this. Any insight or help is much appreciated.

like image 464
Sam Lilienfeld Avatar asked Aug 08 '16 17:08

Sam Lilienfeld


People also ask

How do I turn off settings with copy warning?

One approach that can be used to suppress SettingWithCopyWarning is to perform the chained operations into just a single loc operation. This will ensure that the assignment happens on the original DataFrame instead of a copy. Therefore, if we attempt doing so the warning should no longer be raised.

How do you avoid SettingWithCopyWarning a value is trying to be set on a copy of a slice from a DataFrame?

To solve this problem instead of slicing while getting the required data use the loc method to get required rows and columns. And also use the copy method to store a copy of DataFrame in another variable such that we can separate the get and set operation into 2 lines.

How do you deal with SettingWithCopyWarning?

Generally, to avoid a SettingWithCopyWarning in Pandas, you should do the following: Avoid chained assignments that combine two or more indexing operations like df["z"][mask] = 0 and df. loc[mask]["z"] = 0 . Apply single assignments with just one indexing operation like df.


1 Answers

 izmir = pd.read_excel(filepath)  izmir_lim = izmir[['Gender','Age','MC_OLD_M>=60','MC_OLD_F>=60',                     'MC_OLD_M>18','MC_OLD_F>18','MC_OLD_18>M>5',                     'MC_OLD_18>F>5','MC_OLD_M_Child<5','MC_OLD_F_Child<5',                     'MC_OLD_M>0<=1','MC_OLD_F>0<=1','Date to Delivery',                     'Date to insert','Date of Entery']] 

izmir_lim is a view/copy of izmir. You subsequently attempt to assign to it. This is what is throwing the error. Use this instead:

 izmir_lim = izmir[['Gender','Age','MC_OLD_M>=60','MC_OLD_F>=60',                     'MC_OLD_M>18','MC_OLD_F>18','MC_OLD_18>M>5',                     'MC_OLD_18>F>5','MC_OLD_M_Child<5','MC_OLD_F_Child<5',                     'MC_OLD_M>0<=1','MC_OLD_F>0<=1','Date to Delivery',                     'Date to insert','Date of Entery']].copy() 

Whenever you 'create' a new dataframe from another in the following fashion:

new_df = old_df[list_of_columns_names] 

new_df will have a truthy value in it's is_copy attribute. When you attempt to assign to it, pandas throws the SettingWithCopyWarning.

new_df.iloc[0, 0] = 1  # Should throw an error 

You can overcome this in several ways.

Option #1

new_df = old_df[list_of_columns_names].copy() 

Option #2 (as @ayhan suggested in comments)

new_df = old_df[list_of_columns_names] new_df.is_copy = None 

Option #3

new_df = old_df.loc[:, list_of_columns_names] 
like image 102
piRSquared Avatar answered Oct 01 '22 21:10

piRSquared