How to sample a subsample of large data.table (data.table
package)? Is there more elegant way to perform the following
DT<- data.table(cbind(site = rep(letters[1:2], 1000), value = runif(2000)))
DT[site=="a"][sample(1:nrow(DT[site=="a"]), 100)]
Guess there is a simple solution, but can't choose the right wording to search for.
UPDATE:
More generally, how can I access a row number in data.table's i
argument without creating temporary column for row number?
One of the biggest benefits of using data.table
is that you can set a key for your data.
Using the key
and then .I
(a built in vairable. see ?data.table
for more info) you can use:
setkey(DT, site)
DT[DT["a", sample(.I, 100)]]
As for your second question "how can I access a row number in data.table's i argument"
# Just use the number directly:
DT[17]
Using which
, you can find the row-numbers. Instead of sampling from 1:nrow(...)
you can simply sample from all rows with the desired property. In your example, you can use the following:
DT[sample(which(site=="a"), 100)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With