I am trying to figure out how I can perform logical operators
when I use indexing in data.table package
in R
?
Following is the example. I make a datatable
named as dt
. and then make the var2
as the key in my datatable
:
> dt = data.table(var1 = rep(LETTERS[1:5],2), var2 = seq(1,20, 2), var3 = ceiling(rnorm(10, 3, 2)))
> dt
var1 var2 var3
1: A 1 5
2: B 3 3
3: C 5 0
4: D 7 6
5: E 9 3
6: A 11 4
7: B 13 2
8: C 15 1
9: D 17 3
10: E 19 7
> setkey(dt, var2)
So now I want to identify all the values in my already defined key (var2)
which are less than 10 ( <10)
. Doing the following tries give me errors
.
> dt[ < 10]
Error: unexpected '<' in "dt[ <"
> dt[ .< 10]
Error in eval(expr, envir, enclos) : object '.' not found
> dt[ .(< 10)]
my expectation would be :
var1 var2 var3
1: A 11 4
2: B 13 2
3: C 15 1
4: D 17 3
5: E 19 7
BTW, I know that just by doing dt[var2 <10]
I will get the result. BUT please consider that I want to get the concept of Indexing in data.table
and understand and know how to do it without calling the key(var2)
in every each of my command!
Any help with explanation is highly appreciated.
From ?setkey
, key(dt)
get the key columns in a character vector. Assuming your table has a single key column, then you can get what you want with:
dt[dt[[key(dt)]] < 10]
Thanks to David Arenburg, you can also use get()
:
dt[get(key(dt)) < 10]
This is a little bit shorter and probably the way to go.
The other way I can think to do it is much worse:
dt[eval(parse(text = paste(key(dt), "< 10")))]
from documentation https://www.rdocumentation.org/packages/data.table/versions/1.10.4/topics/setkey
Here is a key to the solution , if possible
> library(data.table)
data.table 1.10.4
The fastest way to learn (by data.table authors): https://www.datacamp.com/courses/data-analysis-the-data-table-way
Documentation: ?data.table, example(data.table) and browseVignettes("data.table")
Release notes, videos and slides: http://r-datatable.com
> data(mtcars)
> head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
> mtcars=data.table(mtcars)
> setkey(mtcars,mpg)
> key(mtcars)
[1] "mpg"
> mtcars[mpg<15,,]
mpg cyl disp hp drat wt qsec vs am gear carb
1: 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
2: 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4
3: 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4
4: 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4
5: 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
> mtcars["mpg"<15,,]
Empty data.table (0 rows) of 11 cols: mpg,cyl,disp,hp,drat,wt...
The problem lies that key(DT) is giving "var2" while the subset in a datatable demands var2 (without the quotes) - we get this using get
So now using Remove quotes from a character vector in R
This is the simplest way
#get(key(mtcars))
> mtcars[get(key(mtcars))<15]
mpg cyl disp hp drat wt qsec vs am gear carb
1: 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
2: 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4
3: 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4
4: 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4
5: 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
For your datatable it will be
DT[get(key(DT))<10]
which is the same as @DavidArenburg 's simple and elegant answer
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With