Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to combine two levels in one categorical variable in R [duplicate]

Tags:

r

I am now learning R, and I have problem with finding a command.

I have the categorical data

levels(job)
[1] "admin."        "blue-collar"   "entrepreneur"  "housemaid"    
[5] "management"    "retired"       "self-employed" "services"     
[9] "student"       "technician"    "unemployed"    "unknown"

now I want to simplify these levels, such as

levels(job) 
[1] "class1"  "class2" "class3" "unknown"

where type1 includes "admin.", "entrepreneur", and "self-employed"; type2 includes "blue-collar","management", and "technician"; type3 includes "housemaid", "student", "retired", and "services"; unknown includes "unknown" and "unemployed".

For this purpose, which command can I use? Thanks! Yan

like image 752
Yanyan Avatar asked Nov 30 '22 17:11

Yanyan


2 Answers

You may also create an 'key/value' index vector and use that to replace the elements in 'job'

indx <-  setNames(rep(c(paste0('type',1:3), 'unknown'), c(3,3,4,2)), 
      c(levels(job)[c(1,3,7)], levels(job)[c(2,5,10)], 
      levels(job)[c(4,6,8,9)], levels(job)[c(11,12)]))

factor(unname(indx[as.character(job)]))

data

v1 <- c('admin.', 'blue-collar', 'entrepreneur', 'housemaid',
'management', 'retired', 'self-employed', 'services', 'student', 
'technician', 'unemployed', 'unknown')
set.seed(24)
job <- factor(sample(v1, 50, replace=TRUE))
like image 29
akrun Avatar answered Dec 05 '22 11:12

akrun


You can assign to levels:

levels(z)[levels(z)%in%c("unemployed","unknown","self-employed")] <- "unknown"

This is covered in the help file -- type ?levels.


Stealing from @akrun's answer, you could do this most cleanly with a hash/list:

ha <- list(
  unknown = c("unemployed","unknown","self-employed"),
  class1  = c("admin.","management")
)

for (i in 1:length(ha)) levels(z)[levels(z)%in%ha[[i]]] <- names(ha)[i]
like image 74
Frank Avatar answered Dec 05 '22 11:12

Frank