I apologize if there is an answer out there already for this... I looked but could not find one. I am trying to convert a matrix of factors into a matrix of numbers that corresponds to each of the factor values for the column. Simple, right? Yet I have run into a variety of very odd problems when I try to do this. Let me explain. Here is a sample dataset: <pre class="prettyprint"><code>demodata2 <- matrix(c("A","B","B","C",NA,"A","B","B",NA,"C","A","B",NA,"B",NA,"C","A","B",NA,NA,NA,"B","C","A","B","B",NA,"B","B",NA,"B","B",NA,"C","A",NA), nrow=6, ncol=6) democolnames <- c("Q","R","S","T","U","W") colnames(demodata2) <- democolnames </code></pre> Yielding: <pre class="prettyprint"><code> Q R S T U W [1,] "A" "B" NA NA "B" "B" [2,] "B" "B" "B" NA "B" "B" [3,] "B" NA NA NA NA NA [4,] "C" "C" "C" "B" "B" "C" [5,] NA "A" "A" "C" "B" "A" [6,] "A" "B" "B" "A" NA NA </code></pre> Ok. So what I want is this: <pre class="prettyprint"><code> Q R S T U W 1 1 2 <NA> <NA> 1 2 2 2 2 2 <NA> 1 2 3 2 <NA> <NA> <NA> <NA> <NA> 4 3 3 3 2 1 3 5 <NA> 1 1 3 1 1 6 1 2 2 1 <NA> <NA> </code></pre> No problem. Let's try <code>as.numeric(demodata2)</code> <pre class="prettyprint"><code>> as.numeric(demodata2) [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [30] NA NA NA NA NA NA NA Warning message: NAs introduced by coercion </code></pre> Less than satisfying. Let's try only one column... <pre class="prettyprint"><code>> as.numeric(demodata2[,3]) [1] NA NA NA NA NA NA Warning message: NAs introduced by coercion </code></pre> * edit * These are actually supposed to be factors, not characters (thanks @Carl Witthoft and @smci)... so let's make this into a dataframe... <pre class="prettyprint"><code>> demodata2 <- as.data.frame(demodata2) > as.numeric(demodata2) Error: (list) object cannot be coerced to type 'double' </code></pre> Nope. But wait... here's where it gets interesting... <pre class="prettyprint"><code>> as.numeric(demodata2$S) [1] NA 2 NA 3 1 2 </code></pre> Well, that is right. Let's validate I can do this calling columns by number: <pre class="prettyprint"><code>> as.numeric(demodata2[,3]) [1] NA 2 NA 3 1 2 </code></pre> Ok. So I can do this column by column assembling my new matrix by iterating through <code>ncol</code> times... but is there a better way? And why does it barf when it is in matrix form, as opposed to data frame? <- edit actually, this is now pretty obvious... in the matrix form, these are characters, not factors. My bad. Question still stands about the dataframe, though... Thanks! (and pointing me to an existing answer is totally fine)

It seems like your <code>U</code> column should be 2 corresponding to "B", not 1. Please clarify that. You could try <code>match()</code> <pre class="prettyprint"><code>matrix(match(demodata2, LETTERS), nrow(demodata2), dimnames=dimnames(demodata2)) # Q R S T U W # [1,] 1 2 NA NA 2 2 # [2,] 2 2 2 NA 2 2 # [3,] 2 NA NA NA NA NA # [4,] 3 3 3 2 2 3 # [5,] NA 1 1 3 2 1 # [6,] 1 2 2 1 NA NA </code></pre> You could also get this result with <pre class="prettyprint"><code>m <- match(demodata2, LETTERS) attributes(m) <- attributes(demodata2) </code></pre> And then look at <code>m</code> <hr> Update for the revised data set : For your updated data, try <pre class="prettyprint"><code>demodata2[] <- lapply(demodata2, as.numeric) demodata2 # Q R S T U W # 1 1 2 NA NA 1 2 # 2 2 2 2 NA 1 2 # 3 2 NA NA NA NA NA # 4 3 3 3 2 1 3 # 5 NA 1 1 3 1 1 # 6 1 2 2 1 NA NA </code></pre> Now you have the 1's in the <code>U</code> column because each column is factored individually and hence <code>B</code> is the first (and only) value in that column.

Mechanically, this is very similar to the <code>'dim<-'</code> answer. A little more transparent, but probably less efficient (maybe?). <pre class="prettyprint"><code>matrix(as.numeric(factor(demodata2)), ncol = ncol(demodata2)) [,1] [,2] [,3] [,4] [,5] [,6] [1,] 1 2 NA NA 2 2 [2,] 2 2 2 NA 2 2 [3,] 2 NA NA NA NA NA [4,] 3 3 3 2 2 3 [5,] NA 1 1 3 2 1 [6,] 1 2 2 1 NA NA </code></pre>

Converting Factor Levels to Numbers

Q: How do I convert a factor to a numeric in a Dataframe in R?

We must first convert the factor vector to a character vector, then to a numeric vector. This ensures that the numeric vector contains the actual numeric values instead of the factor levels.

Q: How do I change a vector to numeric in R?

To convert a character vector to a numeric vector, use as. numeric(). It is important to do this before using the vector in any statistical functions, since the default behavior in R is to convert character vectors to factors.

Q: How do you change factor levels in R?

How do I Rename Factor Levels in R? The simplest way to rename multiple factor levels is to use the levels() function. For example, to recode the factor levels “A”, “B”, and “C” you can use the following code: levels(your_df$Category1) <- c("Factor 1", "Factor 2", "Factor 3") .

Q: How do I convert a dataset to numeric in R?

To convert columns of an R data frame from integer to numeric we can use lapply function. For example, if we have a data frame df that contains all integer columns then we can use the code lapply(df,as. numeric) to convert all of the columns data type into numeric data type.

Tags:

r

na

matrix

I apologize if there is an answer out there already for this... I looked but could not find one.

I am trying to convert a matrix of factors into a matrix of numbers that corresponds to each of the factor values for the column. Simple, right? Yet I have run into a variety of very odd problems when I try to do this.

Let me explain. Here is a sample dataset:

demodata2 <- matrix(c("A","B","B","C",NA,"A","B","B",NA,"C","A","B",NA,"B",NA,"C","A","B",NA,NA,NA,"B","C","A","B","B",NA,"B","B",NA,"B","B",NA,"C","A",NA), nrow=6, ncol=6)
democolnames <- c("Q","R","S","T","U","W")
colnames(demodata2) <- democolnames

Yielding:

     Q   R   S   T   U   W  
[1,] "A" "B" NA  NA  "B" "B"
[2,] "B" "B" "B" NA  "B" "B"
[3,] "B" NA  NA  NA  NA  NA 
[4,] "C" "C" "C" "B" "B" "C"
[5,] NA  "A" "A" "C" "B" "A"
[6,] "A" "B" "B" "A" NA  NA

Ok. So what I want is this:

     Q    R    S    T    U    W
1    1    2 <NA> <NA>    1    2
2    2    2    2 <NA>    1    2
3    2 <NA> <NA> <NA> <NA> <NA>
4    3    3    3    2    1    3
5 <NA>    1    1    3    1    1
6    1    2    2    1 <NA> <NA>

No problem. Let's try as.numeric(demodata2)

> as.numeric(demodata2)
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [30] NA NA NA NA NA NA NA
 Warning message:
 NAs introduced by coercion

Less than satisfying. Let's try only one column...

> as.numeric(demodata2[,3])
[1] NA NA NA NA NA NA
Warning message:
NAs introduced by coercion

* edit *

These are actually supposed to be factors, not characters (thanks @Carl Witthoft and @smci)... so let's make this into a dataframe...

> demodata2 <- as.data.frame(demodata2)
> as.numeric(demodata2)
Error: (list) object cannot be coerced to type 'double'

Nope. But wait... here's where it gets interesting...

> as.numeric(demodata2$S)
[1] NA  2 NA  3  1  2

Well, that is right. Let's validate I can do this calling columns by number:

> as.numeric(demodata2[,3])
[1] NA  2 NA  3  1  2

Ok. So I can do this column by column assembling my new matrix by iterating through ncol times... but is there a better way?

And why does it barf when it is in matrix form, as opposed to data frame? <- edit actually, this is now pretty obvious... in the matrix form, these are characters, not factors. My bad. Question still stands about the dataframe, though...

Thanks! (and pointing me to an existing answer is totally fine)

873

asked Dec 23 '14 21:12

rucker

2 Answers

It seems like your U column should be 2 corresponding to "B", not 1. Please clarify that.

You could try match()

matrix(match(demodata2, LETTERS), nrow(demodata2), dimnames=dimnames(demodata2))
#       Q  R  S  T  U  W
# [1,]  1  2 NA NA  2  2
# [2,]  2  2  2 NA  2  2
# [3,]  2 NA NA NA NA NA
# [4,]  3  3  3  2  2  3
# [5,] NA  1  1  3  2  1
# [6,]  1  2  2  1 NA NA

You could also get this result with

m <- match(demodata2, LETTERS)
attributes(m) <- attributes(demodata2)

And then look at m

Update for the revised data set :

For your updated data, try

demodata2[] <- lapply(demodata2, as.numeric) 
demodata2
#    Q  R  S  T  U  W
# 1  1  2 NA NA  1  2
# 2  2  2  2 NA  1  2
# 3  2 NA NA NA NA NA
# 4  3  3  3  2  1  3
# 5 NA  1  1  3  1  1
# 6  1  2  2  1 NA NA

Now you have the 1's in the U column because each column is factored individually and hence B is the first (and only) value in that column.

115

answered Sep 24 '22 16:09

Rich Scriven

Mechanically, this is very similar to the 'dim<-' answer. A little more transparent, but probably less efficient (maybe?).

matrix(as.numeric(factor(demodata2)), ncol = ncol(demodata2))

     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    2   NA   NA    2    2
[2,]    2    2    2   NA    2    2
[3,]    2   NA   NA   NA   NA   NA
[4,]    3    3    3    2    2    3
[5,]   NA    1    1    3    2    1
[6,]    1    2    2    1   NA   NA

answered Sep 25 '22 16:09

Gregor Thomas

Related questions
                            
                                R : confidence interval being partially displayed with ggplot2 (using geom_smooth())
                            
                                Calculate row sum but exclude a column in R
                            
                                print backslash in R strings
                            
                                Using Conditional Statements to Change the Color of Data Points
                            
                                In R, split a character vector by a specific character; save 3rd piece in new vector
                            
                                Creating 'Top 10' lists in R
                            
                                R: Merge data.table and fill in NAs
                            
                                The diag() function in R
                            
                                How to change the character encoding of .R file in RStudio?
                            
                                R: Raster mosaic from list of rasters?
                            
                                How do I extract multiple character strings from one line using R
                            
                                Using Python to parse a 12GB CSV
                            
                                R X-axis Date Labels using plot()
                            
                                R column mean by factor
                            
                                How to convert list of dataframe to dataframe which has a new column show the name of list in R
                            
                                Separate "Name" into "FirstName" and "LastName" columns of data frame
                            
                                How to detect that a vector is subset of specific vector?
                            
                                Draw 3x3 square grid in R
                            
                                Change the year in a datetime object in R?
                            
                                R: parse string to a matrix

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With