I'm new to pandas
, and, given a data frame, I was trying to drop some columns that don't accomplish an specific requirement. Researching how to do it, I got to this structure:
df = df.loc[df['DS_FAMILIA_PROD'].isin(['CARTOES', 'CARTÕES'])]
However, when processing the frame, I get this error:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self[name] = value
I'm not sure about what to do because I'm already using the .loc
function.
What am I missing?
f = ['ID_manifest', 'issue_date', 'channel', 'product', 'ID_client', 'desc_manifest']
df = pd.DataFrame(columns=f)
for chunk in df2017_chunks:
aux = preProcess(chunk, f)
df = pd.concat([df, aux])
def preProcess(df, f):
stops = list(stopwords.words("portuguese"))
stops.extend(['reclama', 'cliente', 'santander', 'cartao', 'cartão'])
df = df.loc[df['DS_FAMILIA_PROD'].isin(['CARTOES', 'CARTÕES'])]
df.columns = f
df.desc_manifest = df.desc_manifest.str.lower() # All lower case
df.desc_manifest = df.desc_manifest.apply(lambda x: re.sub('[^A-zÀ-ÿ]', ' ', str(x))) # Just letters
df.replace(['NaN', 'nan'], np.nan, inplace = True) # Remone nan
df.dropna(subset=['desc_manifest'], inplace=True)
df.desc_manifest = df.desc_manifest.apply(lambda x: [word for word in str(x).split() if word not in stops]) # Remove stop words
return df
You need copy
, because if you modify values in df
later you will find that the modifications do not propagate back to the original data (df
), and that Pandas does warning.
loc
can be omit, but warning without copy
too.
df = pd.DataFrame({'DS_FAMILIA_PROD':['a','d','b'],
'desc_manifest':['F','rR', 'H'],
'C':[7,8,9]})
def preProcess(df):
df = df[df['DS_FAMILIA_PROD'].isin([u'a', u'b'])].copy()
df.desc_manifest = df.desc_manifest.str.lower() # All
...
...
return df
print (preProcess(df))
C DS_FAMILIA_PROD desc_manifest
0 7 a f
2 9 b h
The purpose of the warning is to show users that they may be operating on a copy and not the original but there can be False positives. As mentioned in the comments, this is not an issue for your use case.
You can simply turn off the check for your dataframe:
df.is_copy = False
or you can explicitly copy:
df = df.loc[df['DS_FAMILIA_PROD'].isin(['CARTOES', 'CARTÕES'])].copy()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With