I am getting a warning "
C:\Python27\lib\site-packages\pandas\core\indexing.py:411: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[item] = s"
Although as suggested in document I am using df.loc ?
def sentenceInReview(df):
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
print "size of df: " + str(df.size)
df.loc[: ,'review_text'] = df.review_text.map(lambda x: tokenizer.tokenize(x))
print df[:3]
The solution is simple: combine the chained operations into a single operation using loc so that pandas can ensure the original DataFrame is set. Pandas will always ensure that unchained set operations, like the below, work. This is what the warning suggests we do, and it works perfectly in this case.
Slicing a DataFrame in Pandas includes the following steps:Ensure Python is installed (or install ActivePython) Import a dataset. Create a DataFrame. Slice the DataFrame.
The copy() method returns a copy of the DataFrame. By default, the copy is a "deep copy" meaning that any changes made in the original DataFrame will NOT be reflected in the copy.
I ran into this problem earlier today, this problem is related to the way Python passes 'object references' around between functions/assigning variables etc.
Unlike in say, R, in python assigning an existing dataframe to a new variable doesn't make a copy, so any operations on the 'new' dataframe is still a reference to the original underlying data.
The way to get around this is to make a deep copy (see docs) whenever you're trying to return a copy of something. See:
import pandas as pd
data = [1, 2, 3, 4, 5]
df = pd.DataFrame(data, columns = {'num'})
dfh = df.head(3) # This assignment doesn't actually make a copy
dfh.loc[:,'num'] = dfh['num'].apply(lambda x: x + 1)
# This will throw you the error
# Use deepcopy function provided in the default package 'copy'
import copy
df_copy = copy.deepcopy(df.head(3))
df_copy.loc[:,'num'] = df_copy['num'].apply(lambda x: x + 1)
# Making a deep copy breaks the reference to the original df. Hence, no more errors.
Here's a bit more on this topic that might explain the way Python does it better.
The common reason for the warning message "A value is trying to be set on a copy of a slice from a DataFrame": A slice over another slice! For example:
dfA=dfB['x','y','z']
dfC=dfA['x','z']
""" For the above codes, you may get such a message since dfC is a slice of dfA while dfA is a slice of dfB. Aka, dfC is a slice over another slice dfA and both are linked to dfB. Under such situation, it does not work whether you use .copy() or deepcopy or other similar ways:-( """
dfA=dfB['x','y','z']
dfC=dfB['x','z']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With