I apologize if there is an answer out there already for this... I looked but could not find one.
I am trying to convert a matrix of factors into a matrix of numbers that corresponds to each of the factor values for the column. Simple, right? Yet I have run into a variety of very odd problems when I try to do this.
Let me explain. Here is a sample dataset:
demodata2 <- matrix(c("A","B","B","C",NA,"A","B","B",NA,"C","A","B",NA,"B",NA,"C","A","B",NA,NA,NA,"B","C","A","B","B",NA,"B","B",NA,"B","B",NA,"C","A",NA), nrow=6, ncol=6)
democolnames <- c("Q","R","S","T","U","W")
colnames(demodata2) <- democolnames
Yielding:
Q R S T U W
[1,] "A" "B" NA NA "B" "B"
[2,] "B" "B" "B" NA "B" "B"
[3,] "B" NA NA NA NA NA
[4,] "C" "C" "C" "B" "B" "C"
[5,] NA "A" "A" "C" "B" "A"
[6,] "A" "B" "B" "A" NA NA
Ok. So what I want is this:
Q R S T U W
1 1 2 <NA> <NA> 1 2
2 2 2 2 <NA> 1 2
3 2 <NA> <NA> <NA> <NA> <NA>
4 3 3 3 2 1 3
5 <NA> 1 1 3 1 1
6 1 2 2 1 <NA> <NA>
No problem. Let's try as.numeric(demodata2)
> as.numeric(demodata2)
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[30] NA NA NA NA NA NA NA
Warning message:
NAs introduced by coercion
Less than satisfying. Let's try only one column...
> as.numeric(demodata2[,3])
[1] NA NA NA NA NA NA
Warning message:
NAs introduced by coercion
* edit *
These are actually supposed to be factors, not characters (thanks @Carl Witthoft and @smci)... so let's make this into a dataframe...
> demodata2 <- as.data.frame(demodata2)
> as.numeric(demodata2)
Error: (list) object cannot be coerced to type 'double'
Nope. But wait... here's where it gets interesting...
> as.numeric(demodata2$S)
[1] NA 2 NA 3 1 2
Well, that is right. Let's validate I can do this calling columns by number:
> as.numeric(demodata2[,3])
[1] NA 2 NA 3 1 2
Ok. So I can do this column by column assembling my new matrix by iterating through ncol
times... but is there a better way?
And why does it barf when it is in matrix form, as opposed to data frame? <- edit actually, this is now pretty obvious... in the matrix form, these are characters, not factors. My bad. Question still stands about the dataframe, though...
Thanks! (and pointing me to an existing answer is totally fine)
We must first convert the factor vector to a character vector, then to a numeric vector. This ensures that the numeric vector contains the actual numeric values instead of the factor levels.
To convert a character vector to a numeric vector, use as. numeric(). It is important to do this before using the vector in any statistical functions, since the default behavior in R is to convert character vectors to factors.
How do I Rename Factor Levels in R? The simplest way to rename multiple factor levels is to use the levels() function. For example, to recode the factor levels “A”, “B”, and “C” you can use the following code: levels(your_df$Category1) <- c("Factor 1", "Factor 2", "Factor 3") .
To convert columns of an R data frame from integer to numeric we can use lapply function. For example, if we have a data frame df that contains all integer columns then we can use the code lapply(df,as. numeric) to convert all of the columns data type into numeric data type.
It seems like your U
column should be 2 corresponding to "B", not 1. Please clarify that.
You could try match()
matrix(match(demodata2, LETTERS), nrow(demodata2), dimnames=dimnames(demodata2))
# Q R S T U W
# [1,] 1 2 NA NA 2 2
# [2,] 2 2 2 NA 2 2
# [3,] 2 NA NA NA NA NA
# [4,] 3 3 3 2 2 3
# [5,] NA 1 1 3 2 1
# [6,] 1 2 2 1 NA NA
You could also get this result with
m <- match(demodata2, LETTERS)
attributes(m) <- attributes(demodata2)
And then look at m
Update for the revised data set :
For your updated data, try
demodata2[] <- lapply(demodata2, as.numeric)
demodata2
# Q R S T U W
# 1 1 2 NA NA 1 2
# 2 2 2 2 NA 1 2
# 3 2 NA NA NA NA NA
# 4 3 3 3 2 1 3
# 5 NA 1 1 3 1 1
# 6 1 2 2 1 NA NA
Now you have the 1's in the U
column because each column is factored individually and hence B
is the first (and only) value in that column.
Mechanically, this is very similar to the 'dim<-'
answer. A little more transparent, but probably less efficient (maybe?).
matrix(as.numeric(factor(demodata2)), ncol = ncol(demodata2))
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 2 NA NA 2 2
[2,] 2 2 2 NA 2 2
[3,] 2 NA NA NA NA NA
[4,] 3 3 3 2 2 3
[5,] NA 1 1 3 2 1
[6,] 1 2 2 1 NA NA
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With