Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to change type of target column when doing := by group in a data.table in R?

I'm trying to do := by group for an existing column of type 'integer' where the new values are of type 'double', which fails.

My scenario is mutating a column representing time into a POSIXct based on values in other columns. I could modify the creating of the data.table as a work around, but I'm still interested in how to go about actually changing the type of a column, as it is suggested in the error message.

Here's a simple toy example of my problem:

db = data.table(id=rep(1:2, each=5), x=1:10, y=runif(10))
db
id  x          y
 1:  1  1 0.47154470
 2:  1  2 0.03325867
 3:  1  3 0.56784494
 4:  1  4 0.47936031
 5:  1  5 0.96318208
 6:  2  6 0.83257416
 7:  2  7 0.10659533
 8:  2  8 0.23103810
 9:  2  9 0.02900567
10:  2 10 0.38346531

db[, x:=mean(y), by=id]   

Error in `[.data.table`(db, , `:=`(x, mean(y)), by = id) : 
Type of RHS ('double') must match LHS ('integer'). To check and coerce would impact performance too much for the fastest cases. Either change the type of the target column, or coerce the RHS of := yourself (e.g. by using 1L instead of 1)
like image 902
hallvig Avatar asked Apr 15 '15 07:04

hallvig


People also ask

How do I remove a column from a data table in R?

The most easiest way to drop columns is by using subset() function. In the code below, we are telling R to drop variables x and z. The '-' sign indicates dropping variables. Make sure the variable names would NOT be specified in quotes when using subset() function.

What does the data table () function provide to big data processing?

It offers fast and memory efficient: file reader and writer, aggregations, updates, equi, non-equi, rolling, range and interval joins, in a short and flexible syntax, for faster development. It is inspired by A[B] syntax in R where A is a matrix and B is a 2-column matrix. Since a data. table is a data.

Which library is data table in R?

Data. table is an extension of data. frame package in R. It is widely used for fast aggregation of large datasets, low latency add/update/remove of columns, quicker ordered joins, and a fast file reader.


1 Answers

We can convert the class of 'x' column to 'numeric' before assigning the 'mean(y)' to 'x' as the class of 'x' is 'integer'. This may be useful if we are replacing 'x' with the mean of any other numeric variable (including 'x').

db[, x:= as.numeric(x)][, x:= mean(y), by=id][]

Or assign to a new column, and change the column name afterwards

setnames(db[, x1:= mean(y),by=id][,x:=NULL],'x1', 'x')

Or we can assign 'x' to 'NULL' and then create 'x' as the mean of 'y' ( @David Arenburg's suggestion)

db[, x:=NULL][, x:= mean(y), by= id][]
like image 52
akrun Avatar answered Oct 02 '22 01:10

akrun