Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subsetting data.table with a condition

How to sample a subsample of large data.table (data.table package)? Is there more elegant way to perform the following

DT<- data.table(cbind(site = rep(letters[1:2], 1000), value = runif(2000)))
DT[site=="a"][sample(1:nrow(DT[site=="a"]), 100)]

Guess there is a simple solution, but can't choose the right wording to search for.

UPDATE: More generally, how can I access a row number in data.table's i argument without creating temporary column for row number?

like image 982
RInatM Avatar asked Jan 12 '23 19:01

RInatM


2 Answers

One of the biggest benefits of using data.table is that you can set a key for your data.
Using the key and then .I (a built in vairable. see ?data.table for more info) you can use:

setkey(DT, site)
DT[DT["a", sample(.I, 100)]] 

As for your second question "how can I access a row number in data.table's i argument"

# Just use the number directly:
DT[17]
like image 101
Ricardo Saporta Avatar answered Jan 17 '23 16:01

Ricardo Saporta


Using which, you can find the row-numbers. Instead of sampling from 1:nrow(...) you can simply sample from all rows with the desired property. In your example, you can use the following:

DT[sample(which(site=="a"), 100)]
like image 25
shadow Avatar answered Jan 17 '23 15:01

shadow