Pandas: Create column which contains 'next' changed value in another column

Question

I would like to create column C from column B without a for loop...

dataframe:

# |  A  |  B |  C  
--+-----+----+-----
1 |  2  |  3 |  4
2 |  3  |  3 |  4
3 |  4  |  4 |  6
4 |  5  |  4 |  6
5 |  5  |  4 |  6
6 |  3  |  6 |  2
7 |  2  |  6 |  2
8 |  4  |  2 |  3  #< --- loop back around if possible (B value at index 1)

Essentially I want to get the value of the next change in B and set it to a new column C.

So far with the answer from : Determining when a column value changes in pandas dataframe I have:

df_filtered = df[df['B'].diff() != 0]

But after that I'm not sure how to create C without using a loop...

EDIT: @(Ayoub ZAROU)'s answer answers my original question, however, I noticed my example dataframe doesn't cover all cases if we are assuming a loop in the data:

# |  A  |  B |  C  
--+-----+----+-----
1 |  2  |  3 |  4
2 |  3  |  3 |  4
3 |  4  |  4 |  6
4 |  5  |  4 |  6
5 |  5  |  4 |  6
6 |  3  |  6 |  2
7 |  2  |  6 |  2
8 |  4  |  2 |  3
9 |  3  |  3 |  4
10|  2  |  3 |  4

In this case, if the last segment of 3's is considered to be part of the first segment of 3's, the last two values in C will be incorrect using this solution.

An easy fix however is to move the last few elements to the beginning of the list or vice versa

Ayoub ZAROU · Accepted Answer

you could try, note that np.roll is the same as shift in pandas, the only difference is that it allows you to roll the values over, In the following, c gives you the indexes where there is no change

c = (df.B.diff(-1) == 0)

c
Out[104]: 
0     True
1    False
2     True
3     True
4    False
5     True
6    False
7    False
Name: B, dtype: bool

we set then the values there to the next value on the B column yieldied using np.roll and set using pandas.Series.where, note that where changes the values where the change column c is not True,

df['C'] = np.nan
df['C'] = df.C.where(c, np.roll(df.B, -1))
df.C

Out[107]: 
0    NaN
1    4.0
2    NaN
3    NaN
4    6.0
5    NaN
6    2.0
7    3.0
Name: C, dtype: float64

we then fill the remaining rows using bfill on pandas and cast it it the B ' column dtype, So , in global, you do

c = (df.B.diff(-1) == 0)
df['C'] = np.nan
df['C'] = df.C.where(c, np.roll(df.B, -1)).bfill().astype(df.B.dtype)

df.C
Out[110]: 
0    4
1    4
2    6
3    6
4    6
5    2
6    2
7    3
Name: C, dtype: int32

Andy Hayden · Answer

Another way is to get the value changes:

In [11]: changes = (df.B != df.B.shift()).cumsum()

In [12]: changes
Out[12]:
0    1
1    1
2    2
3    2
4    2
5    3
6    3
7    4
Name: B, dtype: int64

and a lookup map:

In [13]: lookup = df.B[(df.B != df.B.shift())]

In [14]: lookup.at[len(lookup)] = df.B.iloc[0]

In [15]: lookup
Out[15]:
0    3
2    4
5    6
7    2
4    3
Name: B, dtype: int64

Then use these to lookup the "next":

In [16]: lookup.iloc[changes]
Out[16]:
2    4
2    4
5    6
5    6
5    6
7    2
7    2
4    3
Name: B, dtype: int64

To create the column you need to ignore the duplicates in the index:

In [17]: df["C"] = lookup.iloc[changes].values

Pandas: Create column which contains 'next' changed value in another column

Tags:

python

search

pandas

dataframe

Kyle

2 Answers

Ayoub ZAROU

Andy Hayden

Recent Activity

Donate For Us

Pandas: Create column which contains 'next' changed value in another column

Tags:

python

search

pandas

dataframe

Kyle

2 Answers

Ayoub ZAROU

Andy Hayden

Related questions

Recent Activity

Donate For Us