R: Should keys behave this way in data.table?

Tags:

I have encountered a somewhat unintuitive behavior of keys in data.table package. Here goes an example:

library(data.table)
foo <- data.table(a = c(1:4), b = c(2:5), c = c(3:6), d = c(4:7))
setkey(foo, b)

Then, there is one alarming result of key():

key(foo[, .(mean(c + d)), by = .(b)]) # result is "b".
key(foo[, .(mean(c + d)), by = .(a)]) # result is "a". (!!)

Then, there is another example which produces diffirent, more reasonable results.

foo <- data.table(a = c(4:1), b = c(2:5), c = c(3:6), d = c(4:7))
setkey(foo, b)
key(foo[, .(mean(c + d)), by = .(b)]) # result is "b".
key(foo[, .(mean(c + d)), by = .(a)]) # result is NULL

I admit I'm confused. My lead is this key() somehow checks whether the resulting table needed to be sorted by the elements in by and then assumes it was keyed. Is it a feature? Is it a bug?

220

asked Jul 14 '17 09:07

Karol

1 Answers

Is it a feature? Is it a bug?

In the first example you get key="a" because the result from that query happened to be ordered in a way that a column was in non-decreasing order. Because of that we could call this behaviour a feature.
The problem is that creating a key silently might have not always been desired, thus this behaviour has been changed since you asked that question.
Now (as of 1.12.0) running code from first chunk removes the key and ignores the fact that results are ordered by a.

library(data.table)
foo <- data.table(a = c(1:4), b = c(2:5), c = c(3:6), d = c(4:7))
setkey(foo, b)
key(foo[, .(mean(c + d)), by = .(b)])
#[1] "b"
key(foo[, .(mean(c + d)), by = .(a)])
#NULL

181

answered Oct 06 '22 16:10

jangorecki

Related questions
                            
                                How to compute correlations between all columns in R and detect highly correlated variables
                            
                                Substitute the ^ (power) symbol with C's pow syntax in mathematical expression
                            
                                Converting latitude and longitude points to UTM
                            
                                stratified splitting the data
                            
                                dplyr filter on a vector rather than a dataframe in R
                            
                                How to append group row into dataframe
                            
                                Prime number function in R
                            
                                How to install R version 3 [closed]
                            
                                Position ggplot title at the top right of the plot
                            
                                Error in .External.graphics R
                            
                                beautiful Pie Charts with R
                            
                                Is there a persistent location that is always writable which can be used as data cache by a package?
                            
                                R shiny: How to allow users to stop the process? And how to stop the app from backend?
                            
                                Is it possible to update polygon fill in leaflet for shiny without recreating the map object
                            
                                Include non-CRAN package in CRAN package
                            
                                Plot zoom and locator in RStudio
                            
                                raise a NOTE exception during R CMD check
                            
                                How do I get exact font, line, point and figure sizes in ggplot2?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

R: Should keys behave this way in data.table?

Tags:

r

key

data.table

Karol

People also ask

1 Answers

jangorecki

Recent Activity

Donate For Us