I'm trying to do := by group for an existing column of type 'integer' where the new values are of type 'double', which fails.
My scenario is mutating a column representing time into a POSIXct based on values in other columns. I could modify the creating of the data.table as a work around, but I'm still interested in how to go about actually changing the type of a column, as it is suggested in the error message.
Here's a simple toy example of my problem:
db = data.table(id=rep(1:2, each=5), x=1:10, y=runif(10))
db
id x y
1: 1 1 0.47154470
2: 1 2 0.03325867
3: 1 3 0.56784494
4: 1 4 0.47936031
5: 1 5 0.96318208
6: 2 6 0.83257416
7: 2 7 0.10659533
8: 2 8 0.23103810
9: 2 9 0.02900567
10: 2 10 0.38346531
db[, x:=mean(y), by=id]
Error in `[.data.table`(db, , `:=`(x, mean(y)), by = id) :
Type of RHS ('double') must match LHS ('integer'). To check and coerce would impact performance too much for the fastest cases. Either change the type of the target column, or coerce the RHS of := yourself (e.g. by using 1L instead of 1)
The most easiest way to drop columns is by using subset() function. In the code below, we are telling R to drop variables x and z. The '-' sign indicates dropping variables. Make sure the variable names would NOT be specified in quotes when using subset() function.
It offers fast and memory efficient: file reader and writer, aggregations, updates, equi, non-equi, rolling, range and interval joins, in a short and flexible syntax, for faster development. It is inspired by A[B] syntax in R where A is a matrix and B is a 2-column matrix. Since a data. table is a data.
Data. table is an extension of data. frame package in R. It is widely used for fast aggregation of large datasets, low latency add/update/remove of columns, quicker ordered joins, and a fast file reader.
We can convert the class of 'x' column to 'numeric' before assigning the 'mean(y)' to 'x' as the class of 'x' is 'integer'. This may be useful if we are replacing 'x' with the mean
of any other numeric variable (including 'x').
db[, x:= as.numeric(x)][, x:= mean(y), by=id][]
Or assign to a new column, and change the column name afterwards
setnames(db[, x1:= mean(y),by=id][,x:=NULL],'x1', 'x')
Or we can assign 'x' to 'NULL' and then create 'x' as the mean
of 'y' ( @David Arenburg's suggestion)
db[, x:=NULL][, x:= mean(y), by= id][]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With