I'm trying to execute a Principal Components Analysis, but I'm getting the error: Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric
I know all the columns have to be numeric, but how to handle when you have character objects in the data set? E.g:
data(birth.death.rates.1966)
data2 <- birth.death.rates.1966
princ <- prcomp(data2)
Should I add a new column referring the country name to a numeric code? If yes, how to do this in R?
You can convert a character vector to numeric values by going via factor
. Then each unique value gets a unique integer code. In this example, there's four values so the numbers are 1 to 4, in alphabetical order, I think:
> d = data.frame(country=c("foo","bar","baz","qux"),x=runif(4),y=runif(4))
> d
country x y
1 foo 0.84435112 0.7022875
2 bar 0.01343424 0.5019794
3 baz 0.09815888 0.5832612
4 qux 0.18397525 0.8049514
> d$country = as.numeric(as.factor(d$country))
> d
country x y
1 3 0.84435112 0.7022875
2 1 0.01343424 0.5019794
3 2 0.09815888 0.5832612
4 4 0.18397525 0.8049514
You can then run prcomp
:
> prcomp(d)
Standard deviations:
[1] 1.308665216 0.339983614 0.009141194
Rotation:
PC1 PC2 PC3
country -0.9858920 0.132948161 -0.101694168
x -0.1331795 -0.991081523 -0.004541179
y -0.1013910 0.009066471 0.994805345
Whether this makes sense for your application is up to you. Maybe you just want to drop the first column: prcomp(d[,-1])
and work with the numeric data, which seems to be what the other "answers" are trying to achieve.
The first column of the data frame is character. So you can recode it to row names as :
library(tidyverse)
data2 %>% remove_rownames %>% column_to_rownames(var="country")
princ <- prcomp(data2)
Alternatively as :
data2 <- data2[,-1]
rownames(data2) <- data2[,1]
princ <- prcomp(data2)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With