Here is my problem:
I have a table with categories and I want to rank them:
category
dog
cat
fish
dog
dog
What I want is to add a column and to rank them:
category rank
dog 1
cat 2
fish 3
dog 1
dog 1
Thanks!
There are two steps for converting factor to numeric: Step 1: Convert the data vector into a factor. The factor() command is used to create and modify factors in R. Step 2: The factor is converted into a numeric vector using as. numeric().
To convert columns of an R data frame from integer to numeric we can use lapply function. For example, if we have a data frame df that contains all integer columns then we can use the code lapply(df,as. numeric) to convert all of the columns data type into numeric data type.
The easiest way to convert categorical variables to continuous is by replacing raw categories with the average response value of the category. cutoff : minimum observations in a category. All the categories having observations less than the cutoff will be a different category.
Just for the sake of completeness and because the solution I posted in a comment is an inefficient (and pretty ugly) fix, I'll post an answer too.
It turned out that OP's starting setting was something like the following:
x = c("cat", "dog", "fish", "dog", "dog", "cat", "fish", "catfish")
x = factor(x)
At the end, a manually specified numerical categorization of x
was wanted. As an example, let's suppose that the following matching is wanted:
cat -> 1, dog -> 2, fish -> 3, catfish -> 4
So, some alternatives:
sapply(as.character(x), switch, "cat" = 1, "dog" = 2, "fish" = 3, "catfish" = 4,
USE.NAMES = F)
#[1] 1 2 3 2 2 1 3 4
match(x, c("cat", "dog", "fish", "catfish")) #note that match's internal 'do_match'
#calls 'match_transform' that coerces
#`factor` to `character`, so no need
#for 'as.character(x)'
#(http://svn.r-project.org/R/trunk/src/main/unique.c)
#[1] 1 2 3 2 2 1 3 4
local({ #just to not change 'x'
levels(x) = list("cat" = 1, "dog" = 2, "fish" = 3, "catfish" = 4)
as.numeric(x)
})
#[1] 1 2 3 2 2 1 3 4
library(fastmatch)
fmatch(x, c("cat", "dog", "fish", "catfish")) #a faster alternative to 'match'
#[1] 1 2 3 2 2 1 3 4
And a benchmarking on a larger vector:
X = rep(as.character(x), 1e5)
X = factor(X)
f1 = function() sapply(as.character(X), switch,
"cat" = 1, "dog" = 2, "fish" = 3, "catfish" = 4, USE.NAMES = F)
f2 = function() match(X, c("cat", "dog", "fish", "catfish"))
f3 = function() {levels(X) = list("cat" = 1, "dog" = 2, "fish" = 3, "catfish" = 4) ;
as.numeric(X)}
library(fastmatch)
f4 = function() fmatch(X, c("cat", "dog", "fish", "catfish"))
library(microbenchmark)
microbenchmark(f1(), f2(), f3(), f4(), times = 10)
#Unit: milliseconds
# expr min lq median uq max neval
# f1() 1745.111666 1816.675337 1961.809102 2107.98236 2896.0291 10
# f2() 22.043657 22.786647 23.987263 31.45057 111.9600 10
# f3() 32.704779 32.919150 38.865853 47.67281 134.2988 10
# f4() 8.814958 8.823309 9.856188 19.66435 104.2827 10
sum(f1() != f2())
#[1] 0
sum(f2() != f3())
#[1] 0
sum(f3() != f4())
#[1] 0
I assume that if you write "ranks" you mean ranks. I further assume you want to rank according to number of occurrence.
cats <- factor(c("dog", "cat", "fish", "dog", "dog"))
#see help("rank") for other possibilities to break ties
ranks <- rank(-table(cats), ties.method="first")
DF <- data.frame(category=cats, rank=ranks[as.character(cats)])
print(DF)
# category rank
# 1 dog 1
# 2 cat 2
# 3 fish 3
# 4 dog 1
# 5 dog 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With