Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Principal Components Analysis:Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric

Tags:

r

I'm trying to execute a Principal Components Analysis, but I'm getting the error: Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric

I know all the columns have to be numeric, but how to handle when you have character objects in the data set? E.g:

data(birth.death.rates.1966)
data2 <- birth.death.rates.1966
princ <- prcomp(data2)
  • data2 example of data below:

enter image description here

Should I add a new column referring the country name to a numeric code? If yes, how to do this in R?

like image 921
Rubens Rodrigues Avatar asked May 25 '17 04:05

Rubens Rodrigues


2 Answers

You can convert a character vector to numeric values by going via factor. Then each unique value gets a unique integer code. In this example, there's four values so the numbers are 1 to 4, in alphabetical order, I think:

> d = data.frame(country=c("foo","bar","baz","qux"),x=runif(4),y=runif(4))
> d
  country          x         y
1     foo 0.84435112 0.7022875
2     bar 0.01343424 0.5019794
3     baz 0.09815888 0.5832612
4     qux 0.18397525 0.8049514
> d$country = as.numeric(as.factor(d$country))
> d
  country          x         y
1       3 0.84435112 0.7022875
2       1 0.01343424 0.5019794
3       2 0.09815888 0.5832612
4       4 0.18397525 0.8049514

You can then run prcomp:

> prcomp(d)
Standard deviations:
[1] 1.308665216 0.339983614 0.009141194

Rotation:
               PC1          PC2          PC3
country -0.9858920  0.132948161 -0.101694168
x       -0.1331795 -0.991081523 -0.004541179
y       -0.1013910  0.009066471  0.994805345

Whether this makes sense for your application is up to you. Maybe you just want to drop the first column: prcomp(d[,-1]) and work with the numeric data, which seems to be what the other "answers" are trying to achieve.

like image 178
Spacedman Avatar answered Sep 22 '22 06:09

Spacedman


The first column of the data frame is character. So you can recode it to row names as :

library(tidyverse)
data2 %>% remove_rownames %>% column_to_rownames(var="country")
princ <- prcomp(data2)

Alternatively as :

data2 <- data2[,-1]
rownames(data2) <- data2[,1]
princ <- prcomp(data2)
like image 39
parth Avatar answered Sep 19 '22 06:09

parth