Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to reference column names that start with a number, in data.table

Tags:

r

data.table

If the column names in data.table are in the form of number + character, for example: 4PCS, 5Y etc, how could this be referenced as j in x[i,j] so that it is interpreted as an unquoted column name.

I assume this would solve mine original problem. I wanted to add several column in 'data.table' which were in the form number + character.

M <- data.table('4PCS'=1:4,'5Y'=4:1,X5Y=2:5)
> M[,4PCS+5Y]
Error: unexpected symbol in "M[,4PCS"

The new column should be a sum of 4PSC and 5Y.

Is there a way how to refer to them in data.table in no quoted form? If these columns are referred in data.table with the quoted "logic" of data.frame :

> M[,'5Y',with=FALSE]
     5Y
[1,]  4
[2,]  3
[3,]  2
[4,]  1

then there will be a limitation in functionality of such reference. The addition would not work as it does not work in data.frame:

> M[,'4PCS'+'5Y',with=FALSE]  
Error in "4PCS" + "5Y" : non-numeric argument to binary operator

The data.table functionality would allow to operate over the columns. I would like to find a solution in the new data.table logic hence I can use its ability to transform the columns by column name referencing.

The question is:
How to quote the column name which start with number so that the data.table logic would understand that it is a column name.

like image 542
user2210954 Avatar asked Mar 26 '13 12:03

user2210954


People also ask

Can column names be numbers?

If column names contain any characters except letters, numbers, and underscores, the name must be delimited by enclosing it in back quotes (`).

Can a variable name start with a number in R?

Rules for R variables are: A variable name must start with a letter and can be a combination of letters, digits, period(.) and underscore(_). If it starts with period(.), it cannot be followed by a digit.

What is := in data table?

Modify / Add / Delete columns To modify an existing column, or create a new one, use the := operator. Using the data. table := operator modifies the existing object 'in place', which has the benefit of being memory-efficient. Memory management is an important aspect of data.

Can column names be numeric in R?

You cannot make the column names “properly” numeric but in this (character) form you can easily coerce them to be numeric when you need with the as. numeric() command.


1 Answers

I think, this is what you're looking for, not sure. data.table is different from data.frame. Please have a look at the quick introduction, and then the FAQ (and also the reference manual if necessary).

require(data.table)
dt <- data.table("4PCS" = 1:3, y=3:1)
#    4PCS y
# 1:    1 3
# 2:    2 2
# 3:    3 1

# access column 4PCS
dt[, "4PCS"]

# returns a data.table
#    4PCS
# 1:    1
# 2:    2
# 3:    3

# to access multiple columns by name
dt[, c("4PCS", "y")]

Alternatively, if you need to access the column and not result in a data.table, rather a vector, then you can access using the $ notation:

dt$`4PCS` # notice the ` because the variable begins with a number
# [1] 1 2 3

# alternatively, as mnel mentioned under comments:
dt[, `4PCS`] 
# [1] 1 2 3

Or if you know the column number you can access using [[.]] as follows:

dt[[1]] # 4PCS is the first column here
# [1] 1 2 3

Edit:

Thanks @joran. I think you're looking for this:

dt[, `4PCS` + y]
# [1] 4 4 4

Fundamentally the issue is that 4CPS is not a valid variable name in R (try 4CPS <- 1, you'll get the same "Unexpected symbol" error). So to refer to it, we have to use backticks (compare`4CPS` <- 1)

like image 151
Arun Avatar answered Oct 11 '22 23:10

Arun