I have a pandas dataframe (df) with the column structure :
month a b c d
this dataframe has data for say Jan, Feb, Mar, Apr. A,B,C,D are numeric columns. For the month of Feb , I want to recalculate column A and update it in the dataframe i.e. for month = Feb, A = B + C + D
Code I used :
df[df['month']=='Feb']['A']=df[df['month']=='Feb']['B'] + df[df['month']=='Feb']['C'] + df[df['month']=='Feb']['D']
This ran without errors but did not change the values in column A for the month Feb. In the console, it gave a message that :
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
I tried to use .loc but right now the dataframe I am working on, I had used .reset_index()
on it and I am not sure how to set index and use .loc. I followed documentation but not clear. Could you please help me out here?
This is an example dataframe :
import pandas as pd import numpy as np
dates = pd.date_range('1/1/2000', periods=8)
df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D'])
I want to update say one date : 2000-01-03. I am unable to give the snippet of my data as it is real time data.
Pandas DataFrame update() Method The update() method updates a DataFrame with elements from another similar object (like another DataFrame). Note: this method does NOT return a new DataFrame. The updating is done to the original DataFrame.
The pandas library in Python is used to work with dataframes that structure data in rows and columns. It is widely used in data analysis and machine learning. The loc operator is used to index a portion of the dataframe. loc supports indexing both by row and column names and by using boolean expressions.
As you could see from the warning you should use loc[row_index, col_index]
. When you subsetting your data you get index values. You just need to pass for row_index
and then with comma col_name
:
df.loc[df['month'] == 'Feb', 'A'] = df.loc[df['month'] == 'Feb', 'B'] + df.loc[df['month'] == 'Feb', 'C'] + df.loc[df['month'] == 'Feb', 'D']
While not being the most beautiful, the way I would achieve your goal (without explicitly iterating over the rows) is:
df.ix[df['month'] == 'Feb', 'a'] = df[df['month'] == 'Feb']['b'] + df[df['month'] == 'Feb']['c']
Note: ix
has been deprecated since Pandas v0.20.0 in favour of iloc
/ loc
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With