Creating a column in r that auto-increments based on other columns

Question

My data frame has the customerID, product and date. I need to create a column that generates a transaction id everytime the customerID or the date changes.

My dataframe currently looks like:

CustomerID-Product--date
23-------------abv-------12-12-14
23-------------gfs--------12-12-14
18-------------gra--------12-12-14
52-------------wer--------14-12-14
23-------------qwe-------16-12-14

I need the transactionID column to get populated using r

CustomerID-Product--date----------transactionID
23-------------abv-------12-12-14----1
23-------------gfs--------12-12-14----1
18-------------gra-------12-12-14----2
52-------------wer-------14-12-14----3
23-------------qwe------16-12-14----4

I have the data in a csv file. I am not able to create this column in excel as the file is too big and excel is unable to open the entire file.

Thank you for your help!

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer

Depending on your responses to my comment, you may also be interested in .GRP from "data.table":

library(data.table)
## In case rows get out of order
DT <- as.data.table(df, keep.rownames = TRUE) 
DT[, transactionID := .GRP, by = list(CustomerID, date)][]
   rn CustomerID Product     date transactionID
1:  1         23     abv 12-12-14             1
2:  2         23     gfs 12-12-14             1
3:  3         18     gra 12-12-14             2
4:  4         52     wer 14-12-14             3
5:  5         23     qwe 16-12-14             4

akrun · Answer

Assuming that the columns CustomerID and date are ordered,

indx <- as.character(interaction(df[c(1,3)]))
df$transactionID <- cumsum(c(TRUE,indx[-1]!=indx[-length(indx)]))
df$transactionID
#[1] 1 1 2 3 4

Or if the columns are not ordered, for example, suppose there is one more row (6th row) that duplicates the first row

indx1 <- c(indx, indx[1])
as.numeric(factor(indx1, levels=unique(indx1)))
#[1] 1 1 2 3 4 1

Or

match(indx1, unique(indx1))
#[1] 1 1 2 3 4 1

data

df <- structure(list(CustomerID = c(23L, 23L, 18L, 52L, 23L), Product =
c("abv", "gfs", "gra", "wer", "qwe"), date = c("12-12-14", "12-12-14", 
"12-12-14", "14-12-14", "16-12-14")), .Names = c("CustomerID", 
"Product", "date"), class = "data.frame", row.names = c(NA, -5L))

Creating a column in r that auto-increments based on other columns

Tags:

r

surser

2 Answers

A5C1D2H2I1M1N2O1R2T1

data

akrun

Recent Activity

Donate For Us

Creating a column in r that auto-increments based on other columns

Tags:

r

surser

2 Answers

A5C1D2H2I1M1N2O1R2T1

data

akrun

Related questions

Recent Activity

Donate For Us