Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can't use comma inside backtick column name for data.table setkey?

If I have a column name that needs backticks because it contains a comma, setkey throws an error saying don't use a comma. The error directs me towards ?setkey but I don't see anything in the examples there that mentions this. Only work around I can find is to rename the column, setkey and then rename back.

Example code:

library(data.table)
> DT = data.table(`X, in $` = rnorm(10))
> DT
        X, in $
 1: -1.28475886
 2:  0.97789059
 3: -0.05023914
 4: -0.38133978
 5: -0.24949607
 6:  0.99213156
 7: -0.29310512
 8:  0.02840372
 9:  0.25294231
10: -0.88955013
> setkey(DT, `X, in $`)
Error in setkeyv(x, cols, verbose = verbose) : 
  Don't use comma inside quotes. Please see the examples in help('setkey')

Edit: showing a more likely example

For me the main reason you would come across this is after using reshape2 dcast to take character column values (which will be from an external source, e.g. database) and make them column names.

So long as you don't need the "join" behaviour of the key, and just wanted to sort, then you can work around this by copying the table, or by using data.frame instead. For example:

library(ggplot2)
library(reshape2)

DT = data.table(Office = rep(c("Cambridge, UK", "Cambridge, US", "London", "New York"), each = 12), Product = rep(1:12,4), Sales = rnorm(48)^2)
DF = dcast(DT, Product~Office)
DT = data.table(DF)
setkey(DT, 'Cambridge, UK')
DT = DT[order(DF$`Cambridge, UK`),]
DT

produces:

> library(ggplot2)
> library(reshape2)
> 
> DT = data.table(Office = rep(c("Cambridge, UK", "Cambridge, US", "London", "New York"), each = 12), Product = rep(1:12,4), Sales = rnorm(48)^2)
> DF = dcast(DT, Product~Office)
Using Sales as value column: use value.var to override.
> DT = data.table(DF)
> setkey(DT, 'Cambridge, UK')
Error in setkeyv(x, cols, verbose = verbose) : 
  Don't use comma inside quotes. Please see the examples in help('setkey')
> DT = DT[order(DF$`Cambridge, UK`),]
> DT
    Product Cambridge, UK Cambridge, US      London    New York
 1:      12  0.0009257347  1.7183751269 0.818101229 0.002499808
 2:       1  0.0010855828  0.0889560105 0.083778108 1.451149328
 3:       2  0.0139649148  0.7385617360 0.221688602 4.771307440
 4:       5  0.0520875574  0.3389613574 0.934932759 0.127634044
 5:      10  0.0837778446  0.0598955035 0.015930174 0.715849795
 6:       9  0.0856246191  1.1303900183 1.555058058 0.367063297
 7:       6  0.1608235273  0.7147643550 0.004588596 2.995598768
 8:       8  0.4797866129  0.1783997616 0.016459971 0.497328990
 9:       4  0.5282546636  1.7011670679 0.016126768 0.024388172
10:       7  0.5655147714  0.1106522938 0.045130643 0.442473457
11:       3  0.8315246051  0.1399159784 5.792956446 1.632060601
12:      11  3.9958208033  0.0005297928 0.003282897 1.635506818
like image 509
Corvus Avatar asked Oct 03 '13 10:10

Corvus


People also ask

What are the rules for column names?

Each column name should be unique. If two columns from different tables serving different purposes are in the same database then use some kind of prefixes that separate the two. Column names must not be abstract or cryptic.

How many columns can a key contain?

The key can be any number of columns. The data is always sorted in ascending order with NA s (if any) always first. The table is changed by reference and there is no memory used for the key (other than marking which columns the data is sorted by).

How do you name a column in a database?

Each column name should be unique. If two columns from different tables serving different purposes are in the same database then use some kind of prefixes that separate the two. Column names must not be abstract or cryptic. Use long descriptive names instead of short and unclear abbreviations.

Is it common sense to name columns corresponding to their use?

Thus, it is common sense to name a column corresponding to its use. Column names are like adjectives or verbs. There are the following steps for Naming columns are as follows. Each column name should be unique.


1 Answers

UPDATE (eddi): As of version 1.8.11 this bug has been fixed and arbitrary column names will work with setkey.


I found a hack: (1) sort and (2) settattr.

Example:

mydt <- data.table(`b,ah`=c(2L,3:1),var=letters[1:4])

mydt <- mydt[order(`b,ah`)]
setattr(mydt,'sorted','b,ah')

Now, to verify that it behaves well...

key(mydt)
# [1] "b,ah"
mydt[.(2)]
#    b,ah var
# 1:    2   a
# 2:    2   c
mydt[,.N,by=`b,ah`]
#    b,ah N
# 1:    1 1
# 2:    2 2
# 3:    3 1

Comments. I didn't use the OP's example because setting numeric big floating-point columns as keys is weird (to me).

Who knows what negative side effects this could have? Anyway, I wouldn't use it, and agree it would be nice to have commas supported. Maybe there could be a setkeyn for setting by column number if it makes too much of a mess in setkey/setkeyv?

like image 116
Frank Avatar answered Sep 30 '22 05:09

Frank