I have this pandas dataframe:
df = pd.DataFrame(
{
"col1": [1,1,2,3,3,3,4,5,5,5,5]
}
)
df

I want to add another column that says "last" if the value in col1 doesnt equal the value of col1 in the next row. This is how it should look like:

So far, I can create a column that contains True when if the value in col1 doesnt equal the value of col1 in the next row; and False otherwise:
df["last_row"] = df["col1"].shift(-1)
df['last'] = df["col1"] != df["last_row"]
df = df.drop(["last_row"], axis=1)
df

Now something like
df["last_row"] = df["col1"].shift(-1)
df['last'] = "last" if df["col1"] != df["last_row"]
df = df.drop(["last_row"], axis=1)
df
would be nice, but this is apparently the wrong syntax. How can I manage to do this?
Ultimatly, I also want to add numbers that indicate how many time a value appear before this while the last value is always marked with "last". It should look like this:

I'm not sure if this is another step in my development or if this requires a new approach. I read that if I want to loop through an array while modifying values, I should use apply(). However, I don't know how to include conditions in this. Can you help me?
Thanks a lot!
Here's one way. You can obtain a cumulative count based on whether or not the next value in col1 is the same as that of the current row, defining a custom grouper, and taking the DataFrameGroupBy.cumsum. Then add last using a similar criteria using df.shift:
g = df.col1.ne(df.col1.shift(1)).cumsum()
df['update'] = df.groupby(g).cumcount()
ix = df[df.col1.ne(df.col1.shift(-1))].index
# Int64Index([1, 2, 5, 6, 10], dtype='int64')
df.loc[ix,'update'] = 'last'
col1 update
0 1 0
1 1 last
2 2 last
3 3 0
4 3 1
5 3 last
6 4 last
7 5 0
8 5 1
9 5 2
10 5 last
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With