Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting Factor Levels to Numbers

Tags:

r

na

matrix

I apologize if there is an answer out there already for this... I looked but could not find one.

I am trying to convert a matrix of factors into a matrix of numbers that corresponds to each of the factor values for the column. Simple, right? Yet I have run into a variety of very odd problems when I try to do this.

Let me explain. Here is a sample dataset:

demodata2 <- matrix(c("A","B","B","C",NA,"A","B","B",NA,"C","A","B",NA,"B",NA,"C","A","B",NA,NA,NA,"B","C","A","B","B",NA,"B","B",NA,"B","B",NA,"C","A",NA), nrow=6, ncol=6)
democolnames <- c("Q","R","S","T","U","W")
colnames(demodata2) <- democolnames

Yielding:

     Q   R   S   T   U   W  
[1,] "A" "B" NA  NA  "B" "B"
[2,] "B" "B" "B" NA  "B" "B"
[3,] "B" NA  NA  NA  NA  NA 
[4,] "C" "C" "C" "B" "B" "C"
[5,] NA  "A" "A" "C" "B" "A"
[6,] "A" "B" "B" "A" NA  NA 

Ok. So what I want is this:

     Q    R    S    T    U    W
1    1    2 <NA> <NA>    1    2
2    2    2    2 <NA>    1    2
3    2 <NA> <NA> <NA> <NA> <NA>
4    3    3    3    2    1    3
5 <NA>    1    1    3    1    1
6    1    2    2    1 <NA> <NA>

No problem. Let's try as.numeric(demodata2)

> as.numeric(demodata2)
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [30] NA NA NA NA NA NA NA
 Warning message:
 NAs introduced by coercion 

Less than satisfying. Let's try only one column...

> as.numeric(demodata2[,3])
[1] NA NA NA NA NA NA
Warning message:
NAs introduced by coercion 

* edit *

These are actually supposed to be factors, not characters (thanks @Carl Witthoft and @smci)... so let's make this into a dataframe...

> demodata2 <- as.data.frame(demodata2)
> as.numeric(demodata2)
Error: (list) object cannot be coerced to type 'double'

Nope. But wait... here's where it gets interesting...

> as.numeric(demodata2$S)
[1] NA  2 NA  3  1  2

Well, that is right. Let's validate I can do this calling columns by number:

> as.numeric(demodata2[,3])
[1] NA  2 NA  3  1  2

Ok. So I can do this column by column assembling my new matrix by iterating through ncol times... but is there a better way?

And why does it barf when it is in matrix form, as opposed to data frame? <- edit actually, this is now pretty obvious... in the matrix form, these are characters, not factors. My bad. Question still stands about the dataframe, though...

Thanks! (and pointing me to an existing answer is totally fine)

like image 873
rucker Avatar asked Dec 23 '14 21:12

rucker


People also ask

How do I convert a factor to a numeric in a Dataframe in R?

We must first convert the factor vector to a character vector, then to a numeric vector. This ensures that the numeric vector contains the actual numeric values instead of the factor levels.

How do I change a vector to numeric in R?

To convert a character vector to a numeric vector, use as. numeric(). It is important to do this before using the vector in any statistical functions, since the default behavior in R is to convert character vectors to factors.

How do you change factor levels in R?

How do I Rename Factor Levels in R? The simplest way to rename multiple factor levels is to use the levels() function. For example, to recode the factor levels “A”, “B”, and “C” you can use the following code: levels(your_df$Category1) <- c("Factor 1", "Factor 2", "Factor 3") .

How do I convert a dataset to numeric in R?

To convert columns of an R data frame from integer to numeric we can use lapply function. For example, if we have a data frame df that contains all integer columns then we can use the code lapply(df,as. numeric) to convert all of the columns data type into numeric data type.


2 Answers

It seems like your U column should be 2 corresponding to "B", not 1. Please clarify that.

You could try match()

matrix(match(demodata2, LETTERS), nrow(demodata2), dimnames=dimnames(demodata2))
#       Q  R  S  T  U  W
# [1,]  1  2 NA NA  2  2
# [2,]  2  2  2 NA  2  2
# [3,]  2 NA NA NA NA NA
# [4,]  3  3  3  2  2  3
# [5,] NA  1  1  3  2  1
# [6,]  1  2  2  1 NA NA

You could also get this result with

m <- match(demodata2, LETTERS)
attributes(m) <- attributes(demodata2)

And then look at m


Update for the revised data set :

For your updated data, try

demodata2[] <- lapply(demodata2, as.numeric) 
demodata2
#    Q  R  S  T  U  W
# 1  1  2 NA NA  1  2
# 2  2  2  2 NA  1  2
# 3  2 NA NA NA NA NA
# 4  3  3  3  2  1  3
# 5 NA  1  1  3  1  1
# 6  1  2  2  1 NA NA

Now you have the 1's in the U column because each column is factored individually and hence B is the first (and only) value in that column.

like image 115
Rich Scriven Avatar answered Sep 24 '22 16:09

Rich Scriven


Mechanically, this is very similar to the 'dim<-' answer. A little more transparent, but probably less efficient (maybe?).

matrix(as.numeric(factor(demodata2)), ncol = ncol(demodata2))

     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    2   NA   NA    2    2
[2,]    2    2    2   NA    2    2
[3,]    2   NA   NA   NA   NA   NA
[4,]    3    3    3    2    2    3
[5,]   NA    1    1    3    2    1
[6,]    1    2    2    1   NA   NA
like image 27
Gregor Thomas Avatar answered Sep 25 '22 16:09

Gregor Thomas