Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A value is trying to be set on a copy of a slice from a DataFrame-warning even after using .loc

Tags:

python

pandas

I am getting a warning "

 C:\Python27\lib\site-packages\pandas\core\indexing.py:411: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s" 

Although as suggested in document I am using df.loc ?

def sentenceInReview(df):
    tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
    print "size of df: " + str(df.size)
    df.loc[: ,'review_text'] = df.review_text.map(lambda x: tokenizer.tokenize(x))

    print df[:3]
like image 1000
swati saoji Avatar asked Apr 27 '15 06:04

swati saoji


People also ask

How do you resolve a value is trying to be set on a copy of a slice from a DataFrame?

The solution is simple: combine the chained operations into a single operation using loc so that pandas can ensure the original DataFrame is set. Pandas will always ensure that unchained set operations, like the below, work. This is what the warning suggests we do, and it works perfectly in this case.

How do you slice a Pandas DataFrame?

Slicing a DataFrame in Pandas includes the following steps:Ensure Python is installed (or install ActivePython) Import a dataset. Create a DataFrame. Slice the DataFrame.

What does Copy () do in Python DataFrame?

The copy() method returns a copy of the DataFrame. By default, the copy is a "deep copy" meaning that any changes made in the original DataFrame will NOT be reflected in the copy.


2 Answers

I ran into this problem earlier today, this problem is related to the way Python passes 'object references' around between functions/assigning variables etc.

Unlike in say, R, in python assigning an existing dataframe to a new variable doesn't make a copy, so any operations on the 'new' dataframe is still a reference to the original underlying data.

The way to get around this is to make a deep copy (see docs) whenever you're trying to return a copy of something. See:

import pandas as pd
data = [1, 2, 3, 4, 5]
df = pd.DataFrame(data, columns = {'num'})
dfh = df.head(3)  # This assignment doesn't actually make a copy
dfh.loc[:,'num'] = dfh['num'].apply(lambda x: x + 1)
# This will throw you the error

# Use deepcopy function provided in the default package 'copy' 
import copy
df_copy = copy.deepcopy(df.head(3))
df_copy.loc[:,'num'] = df_copy['num'].apply(lambda x: x + 1)
# Making a deep copy breaks the reference to the original df. Hence, no more errors.

Here's a bit more on this topic that might explain the way Python does it better.

like image 161
Spcogg the second Avatar answered Oct 24 '22 09:10

Spcogg the second


The common reason for the warning message "A value is trying to be set on a copy of a slice from a DataFrame": A slice over another slice! For example:

dfA=dfB['x','y','z']
dfC=dfA['x','z']

""" For the above codes, you may get such a message since dfC is a slice of dfA while dfA is a slice of dfB. Aka, dfC is a slice over another slice dfA and both are linked to dfB. Under such situation, it does not work whether you use .copy() or deepcopy or other similar ways:-( """

Solution:

dfA=dfB['x','y','z']
dfC=dfB['x','z']

Hopefully the above explanation helps:-)

like image 25
Peter D WANG Avatar answered Oct 24 '22 10:10

Peter D WANG