Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count the number of values between value and value - x by variable

I have some data:

library(data.table)
set.seed(1)
df1 <- data.frame(let=sample(sample(letters,2),5, replace=TRUE),
                  num=sample(1:10,5))
setDT(df1)
   let num
1:   j   7
2:   j   6
3:   g   1
4:   j   2
5:   j  10

and I would like to calculate the number of num that are less than or equal to num AND are greater than or equal to num - 4, by let. Using the data.table package would be preferable, but any solution using dplyr or base r would be fine too. The output would look like this:

   let num countNumByLet
1:   j   7             2
2:   j   6             2
3:   g   1             1
4:   j   2             1
5:   j  10             3
like image 292
Eric Frey Avatar asked Nov 30 '19 13:11

Eric Frey


People also ask

How do I count the number of entries in R?

R provides us nrow() function to get the rows for an object. That is, with nrow() function, we can easily detect and extract the number of rows present in an object that can be matrix, data frame or even a dataset.

How do you count the number of times a value appears in a column in R?

To count the number of times a value occurs in a column of an R data frame, we can use table function for that particular column.


1 Answers

This can also be solved using non-equi joins:

dt <- data.table(let = sample(sample(letters, n_let), n_let * n_per_grp, replace = T),
                 num = sample (20, n_let * n_per_grp, replace = T))

dt[, .(let, high = num + 4L, num)
   ][dt,
     on = .(let,
            num <= num,
            high >= num),
     .(countNumByLet = .N),
     by = .EACHI
     ][, high:= NULL][]

   let num countNumByLet
1:   j   7             2
2:   j   6             2
3:   g   1             1
4:   j   2             1
5:   j  10             3

For a dataset of 5, the method doesn't matter. But when scaling up, non-equi joins really help:

n_let <- 26
n_per_grp <- 1E1

dt <- data.table(let = sample(sample(letters, n_let), n_let * n_per_grp, replace = T),
                 num = sample (20, n_let * n_per_grp, replace = T))

# 260 observations; 26 groups
# A tibble: 2 x 13
  expression     min median `itr/sec` mem_alloc
  <bch:expr>  <bch:> <bch:>     <dbl> <bch:byt>
1 dt_sapply   2.41ms 2.67ms      364.    53.9KB
2 dt_non_equi 5.08ms 5.66ms      170.   223.7KB

#2,600 observations; 26 groups
# A tibble: 2 x 13
  expression      min  median `itr/sec` mem_alloc
  <bch:expr>  <bch:t> <bch:t>     <dbl> <bch:byt>
1 dt_sapply   11.49ms 12.15ms      80.3    4.67MB
2 dt_non_equi  6.39ms  7.25ms     117.    398.8KB

#26,000 observations; 26 groups
# A tibble: 2 x 13
  expression      min median `itr/sec` mem_alloc
  <bch:expr>  <bch:t> <bch:>     <dbl> <bch:byt>
1 dt_sapply   404.1ms  404ms      2.47  403.46MB
2 dt_non_equi  24.2ms   25ms     39.8     2.09MB

#260,000 observations; 26 groups
# A tibble: 2 x 13
  expression      min  median `itr/sec` mem_alloc
  <bch:expr>  <bch:t> <bch:t>     <dbl> <bch:byt>
1 dt_sapply     38.6s   38.6s    0.0259    38.8GB
2 dt_non_equi 524.2ms 524.2ms    1.91      19.1MB

like image 90
Cole Avatar answered Nov 15 '22 06:11

Cole