I ran into an unexpected problem when trying to convert multiple columns of a data table into factor columns. I've reproduced it as follows:
library(data.table)
tst <- data.table('a' = c('b','b','c','c'))
class(tst[,a])
tst[,as.factor(a)] #Returns expected result
tst[,as.factor('a'),with=FALSE] #Returns error
The latter command returns 'Error in Math.factor(j) : abs not meaningful for factors'. I found this when attempting to get tst[,lapply(cols, as.factor),with=FALSE] where cols was a collection of rows I was attempting to convert to factors. Is there any solution or workaround for this?
In R, you can convert multiple numeric variables to factor using lapply function. The lapply function is a part of apply family of functions. They perform multiple iterations (loops) in R.
To convert columns of an R data frame from integer to numeric we can use lapply function. For example, if we have a data frame df that contains all integer columns then we can use the code lapply(df,as. numeric) to convert all of the columns data type into numeric data type.
I found one solution:
library(data.table)
tst <- data.table('a' = c('b','b','c','c'))
class(tst[,a])
cols <- 'a'
tst[,(cols):=lapply(.SD, as.factor),.SDcols=cols]
Still, the earlier-mentioned behavior seems buggy.
This is now fixed in v1.8.11, but probably not in the way you'd hoped for. From NEWS:
FR #4867 is now implemented.
DT[, as.factor('x'), with=FALSE]
wherex
is a column inDT
, is now equivalent toDT[, "x", with=FALSE]
instead of ending up with an error. Thanks to tresbot for reporting on SO: Converting multiple data.table columns to factors in R
Some explanation: The difference, when with=FALSE
is used, is that the columns of the data.table
aren't seen as variables anymore. That is:
tst[, as.factor(a), with=FALSE] # would give "a" not found!
would result in an error "a" not found
. But what you do instead is:
tst[, as.factor('a'), with=FALSE]
You're in fact creating a factor "a"
with level="a"
and asking to subset that column. This doesn't really make much sense. Take the case of data.frame
s:
DF <- data.frame(x=1:5, y=6:10)
DF[, c("x", "y")] # gives back DF
DF[, factor(c("x", "y"))] # gives back DF again, not factor columns
DF[, factor(c("x", "x"))] # gives back two columns of "x", still integer, not factor!
So, basically, what you're applying a factor on, when you use with=FALSE
is not on the elements of that column, but just that column name... I hope I've managed to convey the difference well. Feel free to edit/comment if there are any confusions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With