<h3>Original DataSet</h3> <pre class="prettyprint"><code>In [2]: import pandas as pd ...: ...: # Original DataSet ...: d = {'A': [1,1,1,1,2,2,2,2,3], ...: 'B': ['a','a','a','x','b','b','b','x','c'], ...: 'C': [11,22,33,44,55,66,77,88,99],} ...: ...: df = pd.DataFrame(d) ...: df Out[2]: A B C 0 1 a 11 1 1 a 22 2 1 a 33 3 1 x 44 4 2 b 55 5 2 b 66 6 2 b 77 7 2 x 88 8 3 c 99 </code></pre> <p>Given a dataframe, I would like a flexible, efficient way to reset specific values based on certain conditions in two columns. </p> <p>Conditions:</p> <ul> <li>in Column B: for any row with value 'x',</li> <li>in Column C: set the value of these row-elements to the value of the next row.</li> </ul> <h3>Desired Outcome</h3> <pre class="prettyprint"><code>Out[3]: A B C 0 1 a 11 1 1 a 22 2 1 a 33 3 1 x 55 4 2 b 55 5 2 b 66 6 2 b 77 7 2 x 99 8 3 c 99 </code></pre> <p>I learned I can accomplish this using <code>iterrows()</code> (see below),</p> <pre class="prettyprint"><code># Code that produces the above outcome for idx, x_row in df[df['B'] == 'x'].iterrows(): df.loc[idx, 'C'] = df.loc[idx+1, 'C'] df </code></pre> <p>but I need to do this many times, and I understand <code>iterrows()</code> is slow. Are there better pandas-y, broadcasting-like ways of getting the desired outcome more efficiently?</p>

<p>This should do what you want:</p> <pre class="prettyprint"><code>df.C[df.B == 'x'] = df.C.shift(-1) </code></pre>

Can I set dataframe values without using iterrows()?

Original DataSet

In [2]: import pandas as pd
   ...: 
   ...: # Original DataSet
   ...: d = {'A': [1,1,1,1,2,2,2,2,3],
   ...:      'B': ['a','a','a','x','b','b','b','x','c'],
   ...:      'C': [11,22,33,44,55,66,77,88,99],}
   ...: 
   ...: df = pd.DataFrame(d)
   ...: df

Out[2]: 
   A  B   C
0  1  a  11
1  1  a  22
2  1  a  33
3  1  x  44
4  2  b  55
5  2  b  66
6  2  b  77
7  2  x  88
8  3  c  99

Given a dataframe, I would like a flexible, efficient way to reset specific values based on certain conditions in two columns.

Conditions:

in Column B: for any row with value 'x',
in Column C: set the value of these row-elements to the value of the next row.

Desired Outcome

Out[3]: 
   A  B   C
0  1  a  11
1  1  a  22
2  1  a  33
3  1  x  55
4  2  b  55
5  2  b  66
6  2  b  77
7  2  x  99
8  3  c  99

I learned I can accomplish this using iterrows() (see below),

# Code that produces the above outcome
for idx, x_row in df[df['B'] == 'x'].iterrows():
    df.loc[idx, 'C'] = df.loc[idx+1, 'C']
df

but I need to do this many times, and I understand iterrows() is slow. Are there better pandas-y, broadcasting-like ways of getting the desired outcome more efficiently?

742

asked Jul 06 '15 06:07

pylang

1 Answers

This should do what you want:

df.C[df.B == 'x'] = df.C.shift(-1)

133

answered Oct 06 '22 01:10

maxymoo

Related questions
                            
                                Operation 10**(-9) correct in python, but wrong in Cython
                            
                                How do I pass Boolean values created in Python to MongoDB?
                            
                                Is there a built-in Python function which will return the first True-ish value when mapping a function over an iterable?
                            
                                How can I ensure my ttk.Entry's invalid state isn't cleared when it loses focus?
                            
                                Youtube API error v3 - 'No Filter Selected'
                            
                                Debugging linked docker containers when using docker-compose
                            
                                Pip name conflict
                            
                                How to perform file-locking on Windows without installing a new package
                            
                                How to save Python coding in Command Prompt as a file?
                            
                                Django model fields unique=True and default=function
                            
                                Python: Yield in multiprocessing Pool
                            
                                Fastest way to find compute function on DataFrame slices by column value (Python pandas)
                            
                                Python class and global vs local variables [duplicate]
                            
                                Running python script without installed libraries
                            
                                How to create thumbnails using opencv-python?
                            
                                Can you update QPython3 version of Python3?
                            
                                Django doesn't parse a custom http accept header
                            
                                python multiprocessing, big data turn process into sleep
                            
                                Pushing updates from Python server to a web interface
                            
                                Indirectly accessing Python instance attribute without using dot notation

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can I set dataframe values without using iterrows()?

Tags:

python

python-3.x

pandas

dataframe