How to convert from category to numeric in r

Tags:

r

Here is my problem:

I have a table with categories and I want to rank them:

category
dog
cat
fish
dog
dog

What I want is to add a column and to rank them:

category       rank    
dog             1  
cat             2
fish            3
dog             1
dog             1

Sorry for the terrible table (help in writing normal tables in stack overflow would be great, too)
Any ideas about how to add the rank column?

Thanks!

752

asked Dec 26 '13 09:12

Oshrat

2 Answers

Just for the sake of completeness and because the solution I posted in a comment is an inefficient (and pretty ugly) fix, I'll post an answer too.

It turned out that OP's starting setting was something like the following:

x = c("cat", "dog", "fish", "dog", "dog", "cat", "fish", "catfish")
x = factor(x)

At the end, a manually specified numerical categorization of x was wanted. As an example, let's suppose that the following matching is wanted:

cat -> 1, dog -> 2, fish -> 3, catfish -> 4

So, some alternatives:

sapply(as.character(x), switch, "cat" = 1, "dog" = 2, "fish" = 3, "catfish" = 4, 
                                                                USE.NAMES = F)
#[1] 1 2 3 2 2 1 3 4

match(x, c("cat", "dog", "fish", "catfish")) #note that match's internal 'do_match' 
                                             #calls 'match_transform' that coerces
                                             #`factor` to `character`, so no need
                                             #for 'as.character(x)'
                                  #(http://svn.r-project.org/R/trunk/src/main/unique.c)
#[1] 1 2 3 2 2 1 3 4

local({    #just to not change 'x'
levels(x) = list("cat" = 1, "dog" = 2, "fish" = 3, "catfish" = 4)
as.numeric(x)
})
#[1] 1 2 3 2 2 1 3 4

library(fastmatch)
fmatch(x, c("cat", "dog", "fish", "catfish"))  #a faster alternative to 'match'
#[1] 1 2 3 2 2 1 3 4

And a benchmarking on a larger vector:

X = rep(as.character(x), 1e5)
X = factor(X)
f1 = function() sapply(as.character(X), switch, 
            "cat" = 1, "dog" = 2, "fish" = 3, "catfish" = 4, USE.NAMES = F)
f2 = function() match(X, c("cat", "dog", "fish", "catfish")) 
f3 = function() {levels(X) = list("cat" = 1, "dog" = 2, "fish" = 3, "catfish" = 4) ;
                                                       as.numeric(X)}
library(fastmatch)
f4 = function() fmatch(X, c("cat", "dog", "fish", "catfish"))

library(microbenchmark)
microbenchmark(f1(), f2(), f3(), f4(), times = 10)
#Unit: milliseconds
# expr         min          lq      median         uq       max neval
# f1() 1745.111666 1816.675337 1961.809102 2107.98236 2896.0291    10
# f2()   22.043657   22.786647   23.987263   31.45057  111.9600    10
# f3()   32.704779   32.919150   38.865853   47.67281  134.2988    10
# f4()    8.814958    8.823309    9.856188   19.66435  104.2827    10
sum(f1() != f2())
#[1] 0
sum(f2() != f3())
#[1] 0
sum(f3() != f4())
#[1] 0

answered Sep 27 '22 16:09

alexis_laz

I assume that if you write "ranks" you mean ranks. I further assume you want to rank according to number of occurrence.

cats <- factor(c("dog", "cat", "fish", "dog", "dog"))

#see help("rank") for other possibilities to break ties
ranks <- rank(-table(cats), ties.method="first")

DF <- data.frame(category=cats, rank=ranks[as.character(cats)])

print(DF)
#   category rank
# 1      dog    1
# 2      cat    2
# 3     fish    3
# 4      dog    1
# 5      dog    1

answered Sep 27 '22 18:09

Roland

Related questions
                            
                                Apply function on each cell in a column and add the result to a new column
                            
                                Specify spaces between bars in barplot
                            
                                Equivalent for dlply in data.table
                            
                                How can I improve the Makevars file for a Rcpp (RcppEigen) package?
                            
                                How to customize a kernel function in ksvm of kernlab package?
                            
                                shiny app - ggplot can't find data
                            
                                Read Excel Tables, not simple named ranges
                            
                                Initializing MPI cluster with snowfall R
                            
                                merge vectors of a list using row names in R
                            
                                saving R GUI preferences as default
                            
                                Vertex frame width in R network plot
                            
                                Load high-dimensional R dataset into Pandas DataFrame
                            
                                Dynamic number of calls to a chunk with knitr
                            
                                How to adjust the point size to the scale of the plot in ggplot2?
                            
                                Chi Square Test of Independence in Python
                            
                                Shiny - use results of function call in observe in output
                            
                                decimal point setting in fread, data.table
                            
                                How to capture RCurl verbose output
                            
                                Why heatmap.2 in R failed to read the numeric data frame?
                            
                                R: Force data.table to compute all interactions

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With