I have some data:
library(data.table)
set.seed(1)
df1 <- data.frame(let=sample(sample(letters,2),5, replace=TRUE),
num=sample(1:10,5))
setDT(df1)
let num
1: j 7
2: j 6
3: g 1
4: j 2
5: j 10
and I would like to calculate the number of num
that are less than or equal to num
AND are greater than or equal to num
- 4, by let
. Using the data.table package would be preferable, but any solution using dplyr or base r would be fine too.
The output would look like this:
let num countNumByLet
1: j 7 2
2: j 6 2
3: g 1 1
4: j 2 1
5: j 10 3
R provides us nrow() function to get the rows for an object. That is, with nrow() function, we can easily detect and extract the number of rows present in an object that can be matrix, data frame or even a dataset.
To count the number of times a value occurs in a column of an R data frame, we can use table function for that particular column.
This can also be solved using non-equi
joins:
dt <- data.table(let = sample(sample(letters, n_let), n_let * n_per_grp, replace = T),
num = sample (20, n_let * n_per_grp, replace = T))
dt[, .(let, high = num + 4L, num)
][dt,
on = .(let,
num <= num,
high >= num),
.(countNumByLet = .N),
by = .EACHI
][, high:= NULL][]
let num countNumByLet
1: j 7 2
2: j 6 2
3: g 1 1
4: j 2 1
5: j 10 3
For a dataset of 5, the method doesn't matter. But when scaling up, non-equi joins really help:
n_let <- 26
n_per_grp <- 1E1
dt <- data.table(let = sample(sample(letters, n_let), n_let * n_per_grp, replace = T),
num = sample (20, n_let * n_per_grp, replace = T))
# 260 observations; 26 groups
# A tibble: 2 x 13
expression min median `itr/sec` mem_alloc
<bch:expr> <bch:> <bch:> <dbl> <bch:byt>
1 dt_sapply 2.41ms 2.67ms 364. 53.9KB
2 dt_non_equi 5.08ms 5.66ms 170. 223.7KB
#2,600 observations; 26 groups
# A tibble: 2 x 13
expression min median `itr/sec` mem_alloc
<bch:expr> <bch:t> <bch:t> <dbl> <bch:byt>
1 dt_sapply 11.49ms 12.15ms 80.3 4.67MB
2 dt_non_equi 6.39ms 7.25ms 117. 398.8KB
#26,000 observations; 26 groups
# A tibble: 2 x 13
expression min median `itr/sec` mem_alloc
<bch:expr> <bch:t> <bch:> <dbl> <bch:byt>
1 dt_sapply 404.1ms 404ms 2.47 403.46MB
2 dt_non_equi 24.2ms 25ms 39.8 2.09MB
#260,000 observations; 26 groups
# A tibble: 2 x 13
expression min median `itr/sec` mem_alloc
<bch:expr> <bch:t> <bch:t> <dbl> <bch:byt>
1 dt_sapply 38.6s 38.6s 0.0259 38.8GB
2 dt_non_equi 524.2ms 524.2ms 1.91 19.1MB
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With