I want to create a continually increasing counter for each group, where each group is a unique combination of person and day.
This is what the data looks like:
> df
person date
1 0 monday
2 0 tuesday
3 1 monday
4 1 monday
5 1 tuesday
6 2 monday
7 2 monday
8 2 tuesday
9 2 wednesday
Thus, I want to add a new variable starts at 1, and adds for for each new combination of person and day.
> df
person date counter
1 0 monday 1
2 0 tuesday 2
3 1 monday 3
4 1 monday 3
5 1 tuesday 4
6 2 monday 5
7 2 monday 5
8 2 tuesday 6
9 2 wednesday 7
I hope that the data is clear enough. The counter continues until it reaches the end of the data set.
To count unique values per groups in Python Pandas, we can use df. groupby('column_name'). count().
Method 1: Count unique values using nunique() The Pandas dataframe. nunique() function returns a series with the specified axis's total number of unique observations. The total number of distinct observations over the index axis is discovered if we set the value of the axis to 0.
Using SQL Count Distinct distinct() runs distinct on all columns, if you want to get count distinct on selected columns, use the Spark SQL function countDistinct() . This function returns the number of distinct elements in a group.
You can use rleid
from the devel version of data.table
. Instructions to install the devel version are here
library(data.table)#v.9.5+
setDT(df)[, counter:= rleid(date)][]
# person date counter
# 1: 0 monday 1
# 2: 0 tuesday 2
# 3: 1 monday 3
# 4: 1 monday 3
# 5: 1 tuesday 4
# 6: 2 monday 5
# 7: 2 monday 5
# 8: 2 tuesday 6
# 9: 2 wednesday 7
Or
library(dplyr)
df %>%
mutate(counter= cumsum(date!=lag(date, default=FALSE)))
Base package:
df1 <- data.frame(unique(df), counter= 1:nrow(unique(df)))
merge(df, df1)
Output:
person date counter
1 0 monday 1
2 0 tuesday 2
3 1 monday 3
4 1 monday 3
5 1 tuesday 4
6 2 monday 5
7 2 monday 5
8 2 tuesday 6
9 2 wednesday 7
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With