I'm trying to create a column with values from one column, but based on matching another column with the previous value. Here is my current code: <pre class="prettyprint"><code>d = {'a':[1,2,3,1,2,3,2,1], 'b':[10,20,30,40,50,60,70,80]} df = pd.DataFrame(d) df['c'] = df['b'][df['a'] == df['a'].prev()] </code></pre> And my desired output: <pre class="prettyprint"><code> a b c 0 1 10 NaN 1 2 20 NaN 2 3 30 NaN 3 1 40 10 4 2 50 20 5 3 60 30 6 2 70 50 7 1 80 40 </code></pre> ...which I'm not getting because <code>.prev()</code> is not a real thing. Any thoughts?

We can group by <code>a</code> column, which by default sorts values and then "attach" shifted <code>b</code> column: <pre class="prettyprint"><code>In [110]: df['c'] = df.groupby('a')['b'].transform(lambda x: x.shift()) In [111]: df Out[111]: a b c 0 1 10 NaN 1 2 20 NaN 2 3 30 NaN 3 1 40 10.0 4 2 50 20.0 5 3 60 30.0 6 2 70 50.0 7 1 80 40.0 </code></pre> Or much better option - <code>using GroupBy.shift()</code> (thank you @Mitch) <pre class="prettyprint"><code>In [114]: df['c'] = df.groupby('a')['b'].shift() In [115]: df Out[115]: a b c 0 1 10 NaN 1 2 20 NaN 2 3 30 NaN 3 1 40 10.0 4 2 50 20.0 5 3 60 30.0 6 2 70 50.0 7 1 80 40.0 </code></pre>

Pandas: Find previous row of matching value

Tags:

python

pandas

I'm trying to create a column with values from one column, but based on matching another column with the previous value.

Here is my current code:

d = {'a':[1,2,3,1,2,3,2,1], 'b':[10,20,30,40,50,60,70,80]}

df = pd.DataFrame(d)

df['c'] = df['b'][df['a'] == df['a'].prev()]

And my desired output:

   a   b    c
0  1  10  NaN
1  2  20  NaN
2  3  30  NaN
3  1  40   10
4  2  50   20
5  3  60   30
6  2  70   50
7  1  80   40

...which I'm not getting because .prev() is not a real thing. Any thoughts?

797

asked Feb 24 '17 18:02

elPastor

1 Answers

We can group by a column, which by default sorts values and then "attach" shifted b column:

In [110]: df['c'] = df.groupby('a')['b'].transform(lambda x: x.shift())

In [111]: df
Out[111]:
   a   b     c
0  1  10   NaN
1  2  20   NaN
2  3  30   NaN
3  1  40  10.0
4  2  50  20.0
5  3  60  30.0
6  2  70  50.0
7  1  80  40.0

Or much better option - using GroupBy.shift() (thank you @Mitch)

In [114]: df['c'] = df.groupby('a')['b'].shift()

In [115]: df
Out[115]:
   a   b     c
0  1  10   NaN
1  2  20   NaN
2  3  30   NaN
3  1  40  10.0
4  2  50  20.0
5  3  60  30.0
6  2  70  50.0
7  1  80  40.0

150

answered Sep 26 '22 04:09

MaxU - stop WAR against UA

Related questions
                            
                                cartopy: higher resolution for great circle distance line
                            
                                IPython help functionality in ipdb debugger
                            
                                How to plot two real-time data in one single plot in PyQtGraph?
                            
                                Python gmail api send email with attachment pdf all blank
                            
                                Pandas Finding Index From Values In Column
                            
                                Selecting a specific row and column within pandas data array
                            
                                BeautifulSoup tag is type bs4.element.NavigableString and bs4.element.Tag
                            
                                Why cycle behaves differently in just one iteration?
                            
                                Python Slope (given two points find the slope) -answer works & doesn't work;
                            
                                PyCharm is changing the default encoding in my Django app
                            
                                pandas: aggregate rows for a given column and count the number
                            
                                Groupby conditional sum of adjacent rows pandas
                            
                                Find consecutive repeated nan in a numpy array
                            
                                Python requests library gives garbled response even though curl and browser give coherent text
                            
                                Obtaining pointer to python memoryview on bytes object
                            
                                Pandas: Get highest value from a column for each unique value in another column
                            
                                Django + Celery tasks on multiple worker nodes
                            
                                Bokeh Server callback from tools
                            
                                Generating signed session cookie value used in Flask
                            
                                Can't import plotly.figure_factory

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With