Here is my problem:
I have a table with categories and I want to rank them:
category
dog
cat
fish
dog
dog
What I want is to add a column and to rank them:
category       rank    
dog             1  
cat             2
fish            3
dog             1
dog             1
Thanks!
There are two steps for converting factor to numeric: Step 1: Convert the data vector into a factor. The factor() command is used to create and modify factors in R. Step 2: The factor is converted into a numeric vector using as. numeric().
To convert columns of an R data frame from integer to numeric we can use lapply function. For example, if we have a data frame df that contains all integer columns then we can use the code lapply(df,as. numeric) to convert all of the columns data type into numeric data type.
The easiest way to convert categorical variables to continuous is by replacing raw categories with the average response value of the category. cutoff : minimum observations in a category. All the categories having observations less than the cutoff will be a different category.
Just for the sake of completeness and because the solution I posted in a comment is an inefficient (and pretty ugly) fix, I'll post an answer too.
It turned out that OP's starting setting was something like the following:
x = c("cat", "dog", "fish", "dog", "dog", "cat", "fish", "catfish")
x = factor(x)
At the end, a manually specified numerical categorization of x was wanted. As an example, let's suppose that the following matching is wanted:
cat -> 1, dog -> 2, fish -> 3, catfish -> 4
So, some alternatives:
sapply(as.character(x), switch, "cat" = 1, "dog" = 2, "fish" = 3, "catfish" = 4, 
                                                                USE.NAMES = F)
#[1] 1 2 3 2 2 1 3 4
match(x, c("cat", "dog", "fish", "catfish")) #note that match's internal 'do_match' 
                                             #calls 'match_transform' that coerces
                                             #`factor` to `character`, so no need
                                             #for 'as.character(x)'
                                  #(http://svn.r-project.org/R/trunk/src/main/unique.c)
#[1] 1 2 3 2 2 1 3 4
local({    #just to not change 'x'
levels(x) = list("cat" = 1, "dog" = 2, "fish" = 3, "catfish" = 4)
as.numeric(x)
})
#[1] 1 2 3 2 2 1 3 4
library(fastmatch)
fmatch(x, c("cat", "dog", "fish", "catfish"))  #a faster alternative to 'match'
#[1] 1 2 3 2 2 1 3 4
And a benchmarking on a larger vector:
X = rep(as.character(x), 1e5)
X = factor(X)
f1 = function() sapply(as.character(X), switch, 
            "cat" = 1, "dog" = 2, "fish" = 3, "catfish" = 4, USE.NAMES = F)
f2 = function() match(X, c("cat", "dog", "fish", "catfish")) 
f3 = function() {levels(X) = list("cat" = 1, "dog" = 2, "fish" = 3, "catfish" = 4) ;
                                                       as.numeric(X)}
library(fastmatch)
f4 = function() fmatch(X, c("cat", "dog", "fish", "catfish"))
library(microbenchmark)
microbenchmark(f1(), f2(), f3(), f4(), times = 10)
#Unit: milliseconds
# expr         min          lq      median         uq       max neval
# f1() 1745.111666 1816.675337 1961.809102 2107.98236 2896.0291    10
# f2()   22.043657   22.786647   23.987263   31.45057  111.9600    10
# f3()   32.704779   32.919150   38.865853   47.67281  134.2988    10
# f4()    8.814958    8.823309    9.856188   19.66435  104.2827    10
sum(f1() != f2())
#[1] 0
sum(f2() != f3())
#[1] 0
sum(f3() != f4())
#[1] 0
I assume that if you write "ranks" you mean ranks. I further assume you want to rank according to number of occurrence.
cats <- factor(c("dog", "cat", "fish", "dog", "dog"))
#see help("rank") for other possibilities to break ties
ranks <- rank(-table(cats), ties.method="first")
DF <- data.frame(category=cats, rank=ranks[as.character(cats)])
print(DF)
#   category rank
# 1      dog    1
# 2      cat    2
# 3     fish    3
# 4      dog    1
# 5      dog    1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With