I am now learning R, and I have problem with finding a command.
I have the categorical data
levels(job)
[1] "admin." "blue-collar" "entrepreneur" "housemaid"
[5] "management" "retired" "self-employed" "services"
[9] "student" "technician" "unemployed" "unknown"
now I want to simplify these levels, such as
levels(job)
[1] "class1" "class2" "class3" "unknown"
where type1
includes "admin."
, "entrepreneur"
, and "self-employed"
;
type2
includes "blue-collar"
,"management"
, and "technician"
;
type3
includes "housemaid"
, "student"
, "retired"
, and "services"
;
unknown
includes "unknown"
and "unemployed"
.
For this purpose, which command can I use? Thanks! Yan
You may also create an 'key/value' index vector and use that to replace the elements in 'job'
indx <- setNames(rep(c(paste0('type',1:3), 'unknown'), c(3,3,4,2)),
c(levels(job)[c(1,3,7)], levels(job)[c(2,5,10)],
levels(job)[c(4,6,8,9)], levels(job)[c(11,12)]))
factor(unname(indx[as.character(job)]))
v1 <- c('admin.', 'blue-collar', 'entrepreneur', 'housemaid',
'management', 'retired', 'self-employed', 'services', 'student',
'technician', 'unemployed', 'unknown')
set.seed(24)
job <- factor(sample(v1, 50, replace=TRUE))
You can assign to levels
:
levels(z)[levels(z)%in%c("unemployed","unknown","self-employed")] <- "unknown"
This is covered in the help file -- type ?levels
.
Stealing from @akrun's answer, you could do this most cleanly with a hash/list:
ha <- list(
unknown = c("unemployed","unknown","self-employed"),
class1 = c("admin.","management")
)
for (i in 1:length(ha)) levels(z)[levels(z)%in%ha[[i]]] <- names(ha)[i]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With