Can't use comma inside backtick column name for data.table setkey?

Tags:

If I have a column name that needs backticks because it contains a comma, setkey throws an error saying don't use a comma. The error directs me towards ?setkey but I don't see anything in the examples there that mentions this. Only work around I can find is to rename the column, setkey and then rename back.

Example code:

library(data.table)
> DT = data.table(`X, in $` = rnorm(10))
> DT
        X, in $
 1: -1.28475886
 2:  0.97789059
 3: -0.05023914
 4: -0.38133978
 5: -0.24949607
 6:  0.99213156
 7: -0.29310512
 8:  0.02840372
 9:  0.25294231
10: -0.88955013
> setkey(DT, `X, in $`)
Error in setkeyv(x, cols, verbose = verbose) : 
  Don't use comma inside quotes. Please see the examples in help('setkey')

Edit: showing a more likely example

For me the main reason you would come across this is after using reshape2 dcast to take character column values (which will be from an external source, e.g. database) and make them column names.

So long as you don't need the "join" behaviour of the key, and just wanted to sort, then you can work around this by copying the table, or by using data.frame instead. For example:

library(ggplot2)
library(reshape2)

DT = data.table(Office = rep(c("Cambridge, UK", "Cambridge, US", "London", "New York"), each = 12), Product = rep(1:12,4), Sales = rnorm(48)^2)
DF = dcast(DT, Product~Office)
DT = data.table(DF)
setkey(DT, 'Cambridge, UK')
DT = DT[order(DF$`Cambridge, UK`),]
DT

produces:

> library(ggplot2)
> library(reshape2)
> 
> DT = data.table(Office = rep(c("Cambridge, UK", "Cambridge, US", "London", "New York"), each = 12), Product = rep(1:12,4), Sales = rnorm(48)^2)
> DF = dcast(DT, Product~Office)
Using Sales as value column: use value.var to override.
> DT = data.table(DF)
> setkey(DT, 'Cambridge, UK')
Error in setkeyv(x, cols, verbose = verbose) : 
  Don't use comma inside quotes. Please see the examples in help('setkey')
> DT = DT[order(DF$`Cambridge, UK`),]
> DT
    Product Cambridge, UK Cambridge, US      London    New York
 1:      12  0.0009257347  1.7183751269 0.818101229 0.002499808
 2:       1  0.0010855828  0.0889560105 0.083778108 1.451149328
 3:       2  0.0139649148  0.7385617360 0.221688602 4.771307440
 4:       5  0.0520875574  0.3389613574 0.934932759 0.127634044
 5:      10  0.0837778446  0.0598955035 0.015930174 0.715849795
 6:       9  0.0856246191  1.1303900183 1.555058058 0.367063297
 7:       6  0.1608235273  0.7147643550 0.004588596 2.995598768
 8:       8  0.4797866129  0.1783997616 0.016459971 0.497328990
 9:       4  0.5282546636  1.7011670679 0.016126768 0.024388172
10:       7  0.5655147714  0.1106522938 0.045130643 0.442473457
11:       3  0.8315246051  0.1399159784 5.792956446 1.632060601
12:      11  3.9958208033  0.0005297928 0.003282897 1.635506818

509

asked Oct 03 '13 10:10

Corvus

1 Answers

UPDATE (eddi): As of version 1.8.11 this bug has been fixed and arbitrary column names will work with setkey.

I found a hack: (1) sort and (2) settattr.

Example:

mydt <- data.table(`b,ah`=c(2L,3:1),var=letters[1:4])

mydt <- mydt[order(`b,ah`)]
setattr(mydt,'sorted','b,ah')

Now, to verify that it behaves well...

key(mydt)
# [1] "b,ah"
mydt[.(2)]
#    b,ah var
# 1:    2   a
# 2:    2   c
mydt[,.N,by=`b,ah`]
#    b,ah N
# 1:    1 1
# 2:    2 2
# 3:    3 1

Comments. I didn't use the OP's example because setting ~~numeric~~ big floating-point columns as keys is weird (to me).

Who knows what negative side effects this could have? Anyway, I wouldn't use it, and agree it would be nice to have commas supported. Maybe there could be a setkeyn for setting by column number if it makes too much of a mess in setkey/setkeyv?

116

answered Sep 30 '22 05:09

Frank

Related questions
                            
                                Implementation of logistic regression formula in R
                            
                                Showing POSIXt object with Shiny renderTable
                            
                                Formatting and manipulating a plot from the R package "hexbin"
                            
                                Convert markdown to Rd, or define custom markdown conversion rules?
                            
                                Is there a way to share a lock (e.g. a lock file) between R processes?
                            
                                Creating a sequence object from SPELL data
                            
                                How to “flatten” or “collapse” a 2D data frame into a 1D data frame in R?
                            
                                CVX-esque convex optimization in R?
                            
                                retrieve original version of package function even if over-assigned
                            
                                How do I jitter the node split strings in plotting ctree output from partykit?
                            
                                Using R to interpret a symbolic formula for outside use
                            
                                R: XPath expression returns links outside of selected element
                            
                                R - Importing ASCII data using a .sas dictionary file and SAScii
                            
                                Constructing scores from princomp loadings in R
                            
                                Fast computation of kernel matrix in R
                            
                                Impute Missing Values with Caret
                            
                                Sanitising strings in R
                            
                                Add values to rCharts hPlot tooltip
                            
                                Package vignette depends on tikz/pgf
                            
                                Java R Interface (JRI) Setup

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can't use comma inside backtick column name for data.table setkey?

Tags:

dataframe

r

data.table

reshape2

Corvus

People also ask

1 Answers

Frank

Recent Activity

Donate For Us