I'm a newbie to data.table. I'm curious as to when the .SDcols parameter content was processed in the case below? As per the documentation, the value information should not be passed in .SD, and since I have only provided v1 data in .SDcols. So, theoretically it would report an error only? I'm not really understanding.
library(data.table)
dt <- data.table(
group = c("A", "A", "B", "B", "B"),
value = c(3, 6, 1, 2, 4),
v1 = c(1,2,3,4,5)
)
dt[, .SD[value == min(value)], by = group, .SDcols = "v1"]
#> group v1
#> <char> <num>
#> 1: A 1
#> 2: B 3
Created on 2025-06-25 with reprex v2.1.1
One way I would guess to handle this is:
by first.SD.SDcolsLooking forward to the clarification, thanks!
Let's see if we can dive into the process step by step
Content of .SD by group
dt[, by=group,.SD, .SDcols = "v1"]
group v1
<char> <num>
1: A 1
2: A 2
3: B 3
4: B 4
5: B 5
OK normal, lets add value now.
dt[, by=group, cbind(value, .SD), .SDcols = "v1"]
group value v1
<char> <num> <num>
1: A 3 1
2: A 6 2
3: B 1 3
4: B 2 4
5: B 4 5
Being able to do that means that columns are available as well as . SD in J scope. Let's add filter condition.
dt[, by=group, cbind(filter=value==min(value), .SD), .SDcols = "v1"]
group filter v1
<char> <lgcl> <num>
1: A TRUE 1
2: A FALSE 2
3: B TRUE 3
4: B FALSE 4
5: B FALSE 5
Pretty easy to see what's going to happen now :-)
dt[, .SD[value == min(value)], by = group, .SDcols = "v1"]
group v1
<char> <num>
1: A 1
2: B 3
So it's more
grouping is done based on by first
.SD is built from current group row subset keeping only .SDcols and "added" to it
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With