When using Pandas to update the value of a column for specif subset of rows, what is the best way to do it? Easy example: <pre class="prettyprint"><code>import pandas as pd df = pd.DataFrame({'name' : pd.Series(['Alex', 'John', 'Christopher', 'Dwayne']), 'value' : pd.Series([1., 2., 3., 4.])}) </code></pre> Objective: update the <code>value</code> column based on names length and the initial value of the value column itself. The following line achieves the objective: <pre class="prettyprint"><code>df.value[df.name.str.len() == 4 ] = df.value[df.name.str.len() == 4] * 1000 </code></pre> However, this line filters the whole data frame two times, both in LHS and RHS. I assume is not the most efficient way. And it does not do it 'in place'. Basically I'm looking for the pandas equivalent to R data.table ':=' operator: <pre class="prettyprint"><code>df[nchar(name) == 4, value := value*1000] </code></pre> And for other kind of operations such: <pre class="prettyprint"><code>df[nchar(name) == 4, value := paste0("short_", as.character(value))] </code></pre> Environment: <code>Python 3.6</code> <code>Pandas 0.22</code> Thanks in advance.

You need <code>loc</code> with <code>*=</code>: <pre class="prettyprint"><code>df.loc[df.name.str.len() == 4, 'value'] *= 1000 print (df) name value 0 Alex 1000.0 1 John 2000.0 2 Christopher 3.0 3 Dwayne 4.0 </code></pre> EDIT: More general solutions: <pre class="prettyprint"><code>mask = df.name.str.len() == 4 df.loc[mask, 'value'] = df.loc[mask, 'value'] * 1000 </code></pre> Or: <pre class="prettyprint"><code>df.update(df.loc[mask, 'value'] * 1000) </code></pre>

This may be what you require: <pre class="prettyprint"><code> df.loc[df.name.str.len() == 4, 'value'] *= 1000 df.loc[df.name.str.len() == 4, 'value'] = 'short_' + df['value'].astype(str) </code></pre>

Efficient way to update column value for subset of rows on Pandas DataFrame?

Tags:

python

pandas

When using Pandas to update the value of a column for specif subset of rows, what is the best way to do it?

Easy example:

import pandas as pd

df = pd.DataFrame({'name' : pd.Series(['Alex', 'John', 'Christopher', 'Dwayne']),
                   'value' : pd.Series([1., 2., 3., 4.])})

Objective: update the value column based on names length and the initial value of the value column itself.

The following line achieves the objective:

df.value[df.name.str.len() == 4 ] = df.value[df.name.str.len() == 4] * 1000

However, this line filters the whole data frame two times, both in LHS and RHS. I assume is not the most efficient way. And it does not do it 'in place'.

Basically I'm looking for the pandas equivalent to R data.table ':=' operator:

df[nchar(name) == 4, value := value*1000]

And for other kind of operations such:

df[nchar(name) == 4, value := paste0("short_", as.character(value))]

Environment: Python 3.6 Pandas 0.22

Thanks in advance.

792

asked Feb 13 '18 11:02

AlexSB

2 Answers

You need loc with *=:

df.loc[df.name.str.len() == 4, 'value'] *= 1000
print (df)
          name   value
0         Alex  1000.0
1         John  2000.0
2  Christopher     3.0
3       Dwayne     4.0

EDIT:

jezrael

This may be what you require:

 df.loc[df.name.str.len() == 4, 'value'] *= 1000

 df.loc[df.name.str.len() == 4, 'value'] = 'short_' + df['value'].astype(str)

answered Oct 19 '22 23:10

jpp

Related questions
                            
                                What is partitioner parameter in Tensorflow variable_scope used for?
                            
                                Backpropagation with Momentum
                            
                                Change the color for ytick labels in seaborn.clustermap
                            
                                Modify neural net to classify single example
                            
                                pip install local package to target directory
                            
                                How do I use absolute and relative imports in python 3.6?
                            
                                How to convert a wand image object to numpy array (without OpenCV)?
                            
                                Python logging - multiple modules
                            
                                How to concurrently run a infinite loop with asyncio?
                            
                                Django Rest Framework serializer `source` giving weird required error
                            
                                how do I update root certificates of certifi?
                            
                                Error with matches1to2 with Opencv SIFT
                            
                                Upweight a Category
                            
                                Set random seed for matplotlib plotting backend
                            
                                Python display milliseconds in formatted string using `time.strftime`
                            
                                Pandas TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Int64Index'
                            
                                Compute percentile rank relative to a given population
                            
                                Running Python File from Command Line with Libraries in venv
                            
                                TimeStampType in Pyspark with datetime tzaware objects
                            
                                Why is C++ much faster than python with boost?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With