I am now learning R, and I have problem with finding a command.
I have the categorical data
levels(job)
[1] "admin." "blue-collar" "entrepreneur" "housemaid"
[5] "management" "retired" "self-employed" "services"
[9] "student" "technician" "unemployed" "unknown"
now I want to simplify these levels, such as
levels(job)
[1] "class1" "class2" "class3" "unknown"
where type1 includes "admin.", "entrepreneur", and "self-employed";
type2 includes "blue-collar","management", and "technician";
type3 includes "housemaid", "student", "retired", and "services";
unknown includes "unknown" and "unemployed".
For this purpose, which command can I use? Thanks! Yan
You may also create an 'key/value' index vector and use that to replace the elements in 'job'
indx <- setNames(rep(c(paste0('type',1:3), 'unknown'), c(3,3,4,2)),
c(levels(job)[c(1,3,7)], levels(job)[c(2,5,10)],
levels(job)[c(4,6,8,9)], levels(job)[c(11,12)]))
factor(unname(indx[as.character(job)]))
v1 <- c('admin.', 'blue-collar', 'entrepreneur', 'housemaid',
'management', 'retired', 'self-employed', 'services', 'student',
'technician', 'unemployed', 'unknown')
set.seed(24)
job <- factor(sample(v1, 50, replace=TRUE))
You can assign to levels:
levels(z)[levels(z)%in%c("unemployed","unknown","self-employed")] <- "unknown"
This is covered in the help file -- type ?levels.
Stealing from @akrun's answer, you could do this most cleanly with a hash/list:
ha <- list(
unknown = c("unemployed","unknown","self-employed"),
class1 = c("admin.","management")
)
for (i in 1:length(ha)) levels(z)[levels(z)%in%ha[[i]]] <- names(ha)[i]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With