When using Pandas to update the value of a column for specif subset of rows, what is the best way to do it?
Easy example:
import pandas as pd
df = pd.DataFrame({'name' : pd.Series(['Alex', 'John', 'Christopher', 'Dwayne']),
'value' : pd.Series([1., 2., 3., 4.])})
Objective: update the value
column based on names length and the initial value of the value column itself.
The following line achieves the objective:
df.value[df.name.str.len() == 4 ] = df.value[df.name.str.len() == 4] * 1000
However, this line filters the whole data frame two times, both in LHS and RHS. I assume is not the most efficient way. And it does not do it 'in place'.
Basically I'm looking for the pandas equivalent to R data.table ':=' operator:
df[nchar(name) == 4, value := value*1000]
And for other kind of operations such:
df[nchar(name) == 4, value := paste0("short_", as.character(value))]
Environment: Python 3.6
Pandas 0.22
Thanks in advance.
You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.
The main difference between pandas loc[] vs iloc[] is loc gets DataFrame rows & columns by labels/names and iloc[] gets by integer Index/position. For loc[], if the label is not present it gives a key error. For iloc[], if the position is not present it gives an index error.
In order to replace a value in Pandas DataFrame, use the replace() method with the column the from and to values. Below example replace Spark with PySpark value on the Course column. Notice that all the Spark values are replaced with the Pyspark values under the first column.
You need loc
with *=
:
df.loc[df.name.str.len() == 4, 'value'] *= 1000
print (df)
name value
0 Alex 1000.0
1 John 2000.0
2 Christopher 3.0
3 Dwayne 4.0
EDIT:
More general solutions:
mask = df.name.str.len() == 4
df.loc[mask, 'value'] = df.loc[mask, 'value'] * 1000
Or:
df.update(df.loc[mask, 'value'] * 1000)
This may be what you require:
df.loc[df.name.str.len() == 4, 'value'] *= 1000
df.loc[df.name.str.len() == 4, 'value'] = 'short_' + df['value'].astype(str)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With