Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use of loc to update a dataframe python pandas

I have a pandas dataframe (df) with the column structure :

month a b c d

this dataframe has data for say Jan, Feb, Mar, Apr. A,B,C,D are numeric columns. For the month of Feb , I want to recalculate column A and update it in the dataframe i.e. for month = Feb, A = B + C + D

Code I used :

 df[df['month']=='Feb']['A']=df[df['month']=='Feb']['B'] + df[df['month']=='Feb']['C'] + df[df['month']=='Feb']['D'] 

This ran without errors but did not change the values in column A for the month Feb. In the console, it gave a message that :

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

I tried to use .loc but right now the dataframe I am working on, I had used .reset_index() on it and I am not sure how to set index and use .loc. I followed documentation but not clear. Could you please help me out here? This is an example dataframe :

 import pandas as pd import numpy as np
 dates = pd.date_range('1/1/2000', periods=8)
 df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D']) 

I want to update say one date : 2000-01-03. I am unable to give the snippet of my data as it is real time data.

like image 438
Data Enthusiast Avatar asked Dec 28 '15 19:12

Data Enthusiast


People also ask

How do you update data in a DataFrame in Python?

Pandas DataFrame update() Method The update() method updates a DataFrame with elements from another similar object (like another DataFrame). Note: this method does NOT return a new DataFrame. The updating is done to the original DataFrame.

What does LOC in Pandas do?

The pandas library in Python is used to work with dataframes that structure data in rows and columns. It is widely used in data analysis and machine learning. The loc operator is used to index a portion of the dataframe. loc supports indexing both by row and column names and by using boolean expressions.


2 Answers

As you could see from the warning you should use loc[row_index, col_index]. When you subsetting your data you get index values. You just need to pass for row_index and then with comma col_name:

df.loc[df['month'] == 'Feb', 'A'] = df.loc[df['month'] == 'Feb', 'B'] + df.loc[df['month'] == 'Feb', 'C'] + df.loc[df['month'] == 'Feb', 'D'] 
like image 139
Anton Protopopov Avatar answered Sep 24 '22 08:09

Anton Protopopov


While not being the most beautiful, the way I would achieve your goal (without explicitly iterating over the rows) is:

df.ix[df['month'] == 'Feb', 'a'] = df[df['month'] == 'Feb']['b'] + df[df['month'] == 'Feb']['c']  

Note: ix has been deprecated since Pandas v0.20.0 in favour of iloc / loc.

like image 25
DeepSpace Avatar answered Sep 20 '22 08:09

DeepSpace