Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating a column in r that auto-increments based on other columns

Tags:

r

My data frame has the customerID, product and date. I need to create a column that generates a transaction id everytime the customerID or the date changes.

My dataframe currently looks like:

  1. CustomerID-Product--date
  2. 23-------------abv-------12-12-14
  3. 23-------------gfs--------12-12-14
  4. 18-------------gra--------12-12-14
  5. 52-------------wer--------14-12-14
  6. 23-------------qwe-------16-12-14

I need the transactionID column to get populated using r

  1. CustomerID-Product--date----------transactionID
  2. 23-------------abv-------12-12-14----1
  3. 23-------------gfs--------12-12-14----1
  4. 18-------------gra-------12-12-14----2
  5. 52-------------wer-------14-12-14----3
  6. 23-------------qwe------16-12-14----4

I have the data in a csv file. I am not able to create this column in excel as the file is too big and excel is unable to open the entire file.

Thank you for your help!

like image 718
surser Avatar asked Dec 29 '25 10:12

surser


2 Answers

Depending on your responses to my comment, you may also be interested in .GRP from "data.table":

library(data.table)
## In case rows get out of order
DT <- as.data.table(df, keep.rownames = TRUE) 
DT[, transactionID := .GRP, by = list(CustomerID, date)][]
   rn CustomerID Product     date transactionID
1:  1         23     abv 12-12-14             1
2:  2         23     gfs 12-12-14             1
3:  3         18     gra 12-12-14             2
4:  4         52     wer 14-12-14             3
5:  5         23     qwe 16-12-14             4
like image 174
A5C1D2H2I1M1N2O1R2T1 Avatar answered Dec 31 '25 02:12

A5C1D2H2I1M1N2O1R2T1


Assuming that the columns CustomerID and date are ordered,

indx <- as.character(interaction(df[c(1,3)]))
df$transactionID <- cumsum(c(TRUE,indx[-1]!=indx[-length(indx)]))
df$transactionID
#[1] 1 1 2 3 4

Or if the columns are not ordered, for example, suppose there is one more row (6th row) that duplicates the first row

indx1 <- c(indx, indx[1])
as.numeric(factor(indx1, levels=unique(indx1)))
#[1] 1 1 2 3 4 1

Or

match(indx1, unique(indx1))
#[1] 1 1 2 3 4 1

data

df <- structure(list(CustomerID = c(23L, 23L, 18L, 52L, 23L), Product =
c("abv", "gfs", "gra", "wer", "qwe"), date = c("12-12-14", "12-12-14", 
"12-12-14", "14-12-14", "16-12-14")), .Names = c("CustomerID", 
"Product", "date"), class = "data.frame", row.names = c(NA, -5L))
like image 39
akrun Avatar answered Dec 31 '25 03:12

akrun



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!