I'm trying to set the entire column of a dataframe to a specific value.
In [1]: df
Out [1]:
issueid industry
0 001 xxx
1 002 xxx
2 003 xxx
3 004 xxx
4 005 xxx
From what I've seen, loc
is the best practice when replacing values in a dataframe (or isn't it?):
In [2]: df.loc[:,'industry'] = 'yyy'
However, I still received this much talked-about warning message:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
If I do
In [3]: df['industry'] = 'yyy'
I got the same warning message.
Any ideas? Working with Python 3.5.2 and pandas 0.18.1.
Pandas DataFrame: assign() functionThe assign() function is used to assign new columns to a DataFrame. Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten. The column names are keywords.
You can set cell value of pandas dataframe using df.at[row_label, column_label] = 'Cell Value'. It is the fastest method to set the value of the cell of the pandas dataframe. Dataframe at property of the dataframe allows you to access the single value of the row/column pair using the row and column labels.
You can use the assign
function:
df = df.assign(industry='yyy')
Python can do unexpected things when new objects are defined from existing ones. You stated in a comment above that your dataframe is defined along the lines of df = df_all.loc[df_all['issueid']==specific_id,:]
. In this case, df
is really just a stand-in for the rows stored in the df_all
object: a new object is NOT created in memory.
To avoid these issues altogether, I often have to remind myself to use the copy
module, which explicitly forces objects to be copied in memory so that methods called on the new objects are not applied to the source object. I had the same problem as you, and avoided it using the deepcopy
function.
In your case, this should get rid of the warning message:
from copy import deepcopy
df = deepcopy(df_all.loc[df_all['issueid']==specific_id,:])
df['industry'] = 'yyy'
EDIT: Also see David M.'s excellent comment below!
df = df_all.loc[df_all['issueid']==specific_id,:].copy()
df['industry'] = 'yyy'
df.loc[:,'industry'] = 'yyy'
This does the magic. You are to add '.loc' with ':' for all rows. Hope it helps
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With