Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert a factor to integer\numeric without loss of information?

Tags:

casting

r

r-faq

When I convert a factor to a numeric or integer, I get the underlying level codes, not the values as numbers.

f <- factor(sample(runif(5), 20, replace = TRUE)) ##  [1] 0.0248644019011408 0.0248644019011408 0.179684827337041  ##  [4] 0.0284090070053935 0.363644931698218  0.363644931698218  ##  [7] 0.179684827337041  0.249704354675487  0.249704354675487  ## [10] 0.0248644019011408 0.249704354675487  0.0284090070053935 ## [13] 0.179684827337041  0.0248644019011408 0.179684827337041  ## [16] 0.363644931698218  0.249704354675487  0.363644931698218  ## [19] 0.179684827337041  0.0284090070053935 ## 5 Levels: 0.0248644019011408 0.0284090070053935 ... 0.363644931698218  as.numeric(f) ##  [1] 1 1 3 2 5 5 3 4 4 1 4 2 3 1 3 5 4 5 3 2  as.integer(f) ##  [1] 1 1 3 2 5 5 3 4 4 1 4 2 3 1 3 5 4 5 3 2 

I have to resort to paste to get the real values:

as.numeric(paste(f)) ##  [1] 0.02486440 0.02486440 0.17968483 0.02840901 0.36364493 0.36364493 ##  [7] 0.17968483 0.24970435 0.24970435 0.02486440 0.24970435 0.02840901 ## [13] 0.17968483 0.02486440 0.17968483 0.36364493 0.24970435 0.36364493 ## [19] 0.17968483 0.02840901 

Is there a better way to convert a factor to numeric?

like image 283
Adam SO Avatar asked Aug 05 '10 18:08

Adam SO


People also ask

How do you turn a factor into a numeric?

There are two steps for converting factor to numeric: Step 1: Convert the data vector into a factor. The factor() command is used to create and modify factors in R. Step 2: The factor is converted into a numeric vector using as. numeric().

How do I convert CHR to NUM in R?

To convert character to numeric in R, use the as. numeric() function. The as. numeric() is a built-in R function that creates or coerces objects of type “numeric”.

How do I change a vector to numeric in R?

To convert a character vector to a numeric vector, use as. numeric(). It is important to do this before using the vector in any statistical functions, since the default behavior in R is to convert character vectors to factors. Be careful that there are no characters included in any strings, since as.

How do you change the level of a factor in R?

How do I Rename Factor Levels in R? The simplest way to rename multiple factor levels is to use the levels() function. For example, to recode the factor levels “A”, “B”, and “C” you can use the following code: levels(your_df$Category1) <- c("Factor 1", "Factor 2", "Factor 3") .


2 Answers

See the Warning section of ?factor:

In particular, as.numeric applied to a factor is meaningless, and may happen by implicit coercion. To transform a factor f to approximately its original numeric values, as.numeric(levels(f))[f] is recommended and slightly more efficient than as.numeric(as.character(f)).

The FAQ on R has similar advice.


Why is as.numeric(levels(f))[f] more efficent than as.numeric(as.character(f))?

as.numeric(as.character(f)) is effectively as.numeric(levels(f)[f]), so you are performing the conversion to numeric on length(x) values, rather than on nlevels(x) values. The speed difference will be most apparent for long vectors with few levels. If the values are mostly unique, there won't be much difference in speed. However you do the conversion, this operation is unlikely to be the bottleneck in your code, so don't worry too much about it.


Some timings

library(microbenchmark) microbenchmark(   as.numeric(levels(f))[f],   as.numeric(levels(f)[f]),   as.numeric(as.character(f)),   paste0(x),   paste(x),   times = 1e5 ) ## Unit: microseconds ##                         expr   min    lq      mean median     uq      max neval ##     as.numeric(levels(f))[f] 3.982 5.120  6.088624  5.405  5.974 1981.418 1e+05 ##     as.numeric(levels(f)[f]) 5.973 7.111  8.352032  7.396  8.250 4256.380 1e+05 ##  as.numeric(as.character(f)) 6.827 8.249  9.628264  8.534  9.671 1983.694 1e+05 ##                    paste0(x) 7.964 9.387 11.026351  9.956 10.810 2911.257 1e+05 ##                     paste(x) 7.965 9.387 11.127308  9.956 11.093 2419.458 1e+05 
like image 176
Joshua Ulrich Avatar answered Nov 01 '22 08:11

Joshua Ulrich


R has a number of (undocumented) convenience functions for converting factors:

  • as.character.factor
  • as.data.frame.factor
  • as.Date.factor
  • as.list.factor
  • as.vector.factor
  • ...

But annoyingly, there is nothing to handle the factor -> numeric conversion. As an extension of Joshua Ulrich's answer, I would suggest to overcome this omission with the definition of your own idiomatic function:

as.double.factor <- function(x) {as.numeric(levels(x))[x]} 

that you can store at the beginning of your script, or even better in your .Rprofile file.

like image 32
Jealie Avatar answered Nov 01 '22 08:11

Jealie