Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A value is trying to be set on a copy of a slice from a DataFrame. - pandas

I'm new to pandas, and, given a data frame, I was trying to drop some columns that don't accomplish an specific requirement. Researching how to do it, I got to this structure:

df = df.loc[df['DS_FAMILIA_PROD'].isin(['CARTOES', 'CARTÕES'])]

However, when processing the frame, I get this error:

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self[name] = value

I'm not sure about what to do because I'm already using the .loc function. What am I missing?

f = ['ID_manifest', 'issue_date', 'channel', 'product', 'ID_client', 'desc_manifest']

df = pd.DataFrame(columns=f)
for chunk in df2017_chunks:
    aux = preProcess(chunk, f)
    df = pd.concat([df, aux])

def preProcess(df, f):    
    stops = list(stopwords.words("portuguese"))
    stops.extend(['reclama', 'cliente', 'santander', 'cartao', 'cartão'])

    df = df.loc[df['DS_FAMILIA_PROD'].isin(['CARTOES', 'CARTÕES'])]

    df.columns = f
    df.desc_manifest = df.desc_manifest.str.lower() # All lower case
    df.desc_manifest = df.desc_manifest.apply(lambda x: re.sub('[^A-zÀ-ÿ]', ' ', str(x))) # Just letters
    df.replace(['NaN', 'nan'], np.nan, inplace = True) # Remone nan
    df.dropna(subset=['desc_manifest'], inplace=True)
    df.desc_manifest = df.desc_manifest.apply(lambda x: [word for word in str(x).split() if word not in stops]) # Remove stop words

    return df
like image 819
pceccon Avatar asked May 17 '17 15:05

pceccon


2 Answers

You need copy, because if you modify values in df later you will find that the modifications do not propagate back to the original data (df), and that Pandas does warning.

loc can be omit, but warning without copy too.

df = pd.DataFrame({'DS_FAMILIA_PROD':['a','d','b'],
                   'desc_manifest':['F','rR', 'H'],
                   'C':[7,8,9]})

def preProcess(df):    
    df = df[df['DS_FAMILIA_PROD'].isin([u'a', u'b'])].copy()
    df.desc_manifest = df.desc_manifest.str.lower() # All
    ...
    ...
    return df


print (preProcess(df))
   C DS_FAMILIA_PROD desc_manifest
0  7               a             f
2  9               b             h
like image 81
jezrael Avatar answered Oct 26 '22 05:10

jezrael


The purpose of the warning is to show users that they may be operating on a copy and not the original but there can be False positives. As mentioned in the comments, this is not an issue for your use case.

You can simply turn off the check for your dataframe:

df.is_copy = False

or you can explicitly copy:

df = df.loc[df['DS_FAMILIA_PROD'].isin(['CARTOES', 'CARTÕES'])].copy()
like image 35
A.Kot Avatar answered Oct 26 '22 05:10

A.Kot