Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create the frequency count from a vector in R [duplicate]

Tags:

r

vector

Suppose there is a vector with numerical values with possible duplicated values

x <- c(1, 2, 3, 4, 5, 1, 2, 2, 3)

I want to create another vector of counts as follows.

  1. It has the same length as x.
  2. For each unique value in x, the first appearance is 1, the second appearance is 2, and so on.

The new vector I want is

1, 1, 1, 1, 1, 2, 2, 3, 2

I need a fast way of doing this since x can be really long.

like image 643
JACKY Li Avatar asked Feb 15 '23 06:02

JACKY Li


1 Answers

Use ave and seq_along:

> x <- c(1, 2, 3, 4, 5, 1, 2, 2, 3)
> ave(x, x, FUN = seq_along)
[1] 1 1 1 1 1 2 2 3 2

Another option to consider is data.table. Although it is a little bit more work, it might pay off on very long vectors.

Here it is on your example--definitely seems like overkill!

library(data.table)

x <- c(1, 2, 3, 4, 5, 1, 2, 2, 3)
DT <- data.table(id = sequence(length(x)), x, key = "id")
DT[, y := sequence(.N), by = x][, y]
# [1] 1 1 1 1 1 2 2 3 2

But how about on a vector 10,000,000 items long?

set.seed(1)
x2 <- sample(100, 1e7, replace = TRUE)

funAve <- function() {
  ave(x2, x2, FUN = seq_along)
}

funDT <- function() {
  DT <- data.table(id = sequence(length(x2)), x2, key = "id")
  DT[, y := sequence(.N), by = x2][, y]
}

identical(funAve(), funDT())
# [1] TRUE

library(microbenchmark)
# Unit: seconds
#      expr      min       lq   median       uq      max neval
#  funAve() 6.727557 6.792743 6.827117 6.992609 7.352666    20
#   funDT() 1.967795 2.029697 2.053886 2.070462 2.123531    20
like image 141
A5C1D2H2I1M1N2O1R2T1 Avatar answered Mar 03 '23 23:03

A5C1D2H2I1M1N2O1R2T1