Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Set value to an entire column of a pandas dataframe

I'm trying to set the entire column of a dataframe to a specific value.

In  [1]: df
Out [1]: 
     issueid   industry
0        001        xxx
1        002        xxx
2        003        xxx
3        004        xxx
4        005        xxx

From what I've seen, loc is the best practice when replacing values in a dataframe (or isn't it?):

In  [2]: df.loc[:,'industry'] = 'yyy'

However, I still received this much talked-about warning message:

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead

If I do

In  [3]: df['industry'] = 'yyy'

I got the same warning message.

Any ideas? Working with Python 3.5.2 and pandas 0.18.1.

like image 238
Jingwei Yu Avatar asked Jun 23 '17 13:06

Jingwei Yu


People also ask

How do I assign a value to a column in pandas DataFrame?

Pandas DataFrame: assign() functionThe assign() function is used to assign new columns to a DataFrame. Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten. The column names are keywords.

How do you assign a value to a column in Python?

You can set cell value of pandas dataframe using df.at[row_label, column_label] = 'Cell Value'. It is the fastest method to set the value of the cell of the pandas dataframe. Dataframe at property of the dataframe allows you to access the single value of the row/column pair using the row and column labels.


3 Answers

You can use the assign function:

df = df.assign(industry='yyy')
like image 185
Mina HE Avatar answered Oct 22 '22 05:10

Mina HE


Python can do unexpected things when new objects are defined from existing ones. You stated in a comment above that your dataframe is defined along the lines of df = df_all.loc[df_all['issueid']==specific_id,:]. In this case, df is really just a stand-in for the rows stored in the df_all object: a new object is NOT created in memory.

To avoid these issues altogether, I often have to remind myself to use the copy module, which explicitly forces objects to be copied in memory so that methods called on the new objects are not applied to the source object. I had the same problem as you, and avoided it using the deepcopy function.

In your case, this should get rid of the warning message:

from copy import deepcopy
df = deepcopy(df_all.loc[df_all['issueid']==specific_id,:])
df['industry'] = 'yyy'

EDIT: Also see David M.'s excellent comment below!

df = df_all.loc[df_all['issueid']==specific_id,:].copy()
df['industry'] = 'yyy'
like image 38
Alex P. Miller Avatar answered Oct 22 '22 04:10

Alex P. Miller


df.loc[:,'industry'] = 'yyy'

This does the magic. You are to add '.loc' with ':' for all rows. Hope it helps

like image 30
Nwoye CID Avatar answered Oct 22 '22 05:10

Nwoye CID