If I have a column name that needs backticks because it contains a comma, setkey
throws an error saying don't use a comma. The error directs me towards ?setkey
but I don't see anything in the examples there that mentions this. Only work around I can find is to rename the column, setkey
and then rename back.
Example code:
library(data.table)
> DT = data.table(`X, in $` = rnorm(10))
> DT
X, in $
1: -1.28475886
2: 0.97789059
3: -0.05023914
4: -0.38133978
5: -0.24949607
6: 0.99213156
7: -0.29310512
8: 0.02840372
9: 0.25294231
10: -0.88955013
> setkey(DT, `X, in $`)
Error in setkeyv(x, cols, verbose = verbose) :
Don't use comma inside quotes. Please see the examples in help('setkey')
Edit: showing a more likely example
For me the main reason you would come across this is after using reshape2
dcast
to take character column values (which will be from an external source, e.g. database) and make them column names.
So long as you don't need the "join" behaviour of the key, and just wanted to sort, then you can work around this by copying the table, or by using data.frame
instead. For example:
library(ggplot2)
library(reshape2)
DT = data.table(Office = rep(c("Cambridge, UK", "Cambridge, US", "London", "New York"), each = 12), Product = rep(1:12,4), Sales = rnorm(48)^2)
DF = dcast(DT, Product~Office)
DT = data.table(DF)
setkey(DT, 'Cambridge, UK')
DT = DT[order(DF$`Cambridge, UK`),]
DT
produces:
> library(ggplot2)
> library(reshape2)
>
> DT = data.table(Office = rep(c("Cambridge, UK", "Cambridge, US", "London", "New York"), each = 12), Product = rep(1:12,4), Sales = rnorm(48)^2)
> DF = dcast(DT, Product~Office)
Using Sales as value column: use value.var to override.
> DT = data.table(DF)
> setkey(DT, 'Cambridge, UK')
Error in setkeyv(x, cols, verbose = verbose) :
Don't use comma inside quotes. Please see the examples in help('setkey')
> DT = DT[order(DF$`Cambridge, UK`),]
> DT
Product Cambridge, UK Cambridge, US London New York
1: 12 0.0009257347 1.7183751269 0.818101229 0.002499808
2: 1 0.0010855828 0.0889560105 0.083778108 1.451149328
3: 2 0.0139649148 0.7385617360 0.221688602 4.771307440
4: 5 0.0520875574 0.3389613574 0.934932759 0.127634044
5: 10 0.0837778446 0.0598955035 0.015930174 0.715849795
6: 9 0.0856246191 1.1303900183 1.555058058 0.367063297
7: 6 0.1608235273 0.7147643550 0.004588596 2.995598768
8: 8 0.4797866129 0.1783997616 0.016459971 0.497328990
9: 4 0.5282546636 1.7011670679 0.016126768 0.024388172
10: 7 0.5655147714 0.1106522938 0.045130643 0.442473457
11: 3 0.8315246051 0.1399159784 5.792956446 1.632060601
12: 11 3.9958208033 0.0005297928 0.003282897 1.635506818
Each column name should be unique. If two columns from different tables serving different purposes are in the same database then use some kind of prefixes that separate the two. Column names must not be abstract or cryptic.
The key can be any number of columns. The data is always sorted in ascending order with NA s (if any) always first. The table is changed by reference and there is no memory used for the key (other than marking which columns the data is sorted by).
Each column name should be unique. If two columns from different tables serving different purposes are in the same database then use some kind of prefixes that separate the two. Column names must not be abstract or cryptic. Use long descriptive names instead of short and unclear abbreviations.
Thus, it is common sense to name a column corresponding to its use. Column names are like adjectives or verbs. There are the following steps for Naming columns are as follows. Each column name should be unique.
UPDATE (eddi): As of version 1.8.11 this bug has been fixed and arbitrary column names will work with setkey
.
I found a hack: (1) sort and (2) settattr
.
Example:
mydt <- data.table(`b,ah`=c(2L,3:1),var=letters[1:4])
mydt <- mydt[order(`b,ah`)]
setattr(mydt,'sorted','b,ah')
Now, to verify that it behaves well...
key(mydt)
# [1] "b,ah"
mydt[.(2)]
# b,ah var
# 1: 2 a
# 2: 2 c
mydt[,.N,by=`b,ah`]
# b,ah N
# 1: 1 1
# 2: 2 2
# 3: 3 1
Comments. I didn't use the OP's example because setting numeric big floating-point columns as keys is weird (to me).
Who knows what negative side effects this could have? Anyway, I wouldn't use it, and agree it would be nice to have commas supported. Maybe there could be a setkeyn
for setting by column number if it makes too much of a mess in setkey
/setkeyv
?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With