So I have a large dataset that looks like this:
V1 V2 V3 V4
1 Sleep Domestic Eat Child Care
2 Sleep Domestic Eat Paid
3 Sleep Domestic Eat Child Care
4 Sleep Eat Paid <NA>
What I would like to do is to reorder
the columns based on a "template"
["Sleep", "Eat", "Domestic", "Paid", "Child care"]
To get (ouput)
V1 V2 V3 V4 V5
Sleep Eat Domestic NA Child Care
Sleep Eat Domestic Paid NA
Sleep Eat Domestic NA Child Care
Sleep Eat NA Paid NA
So in columns 1 Sleep
, columns 2 Eat
, ...
I have no idea where to start with this. Any idea ?
data
x = structure(list(V1 = c("Sleep", "Sleep", "Sleep", "Sleep"), V2 = c("Domestic",
"Domestic", "Domestic", "Eat"), V3 = c("Eat", "Eat", "Eat", "Paid"
), V4 = c("Child Care", "Paid", "Child Care", NA)), .Names = c("V1",
"V2", "V3", "V4"), row.names = c(NA, 4L), class = "data.frame")
template = c('Sleep', 'Eat', 'Domestic', 'Paid', 'Child care')
Check the rowSums
for each template
value and then piece it together again:
template <- c("Sleep", "Eat", "Domestic", "Paid", "Child Care")
# i've fixed this template so the case matches the values for 'Child Care'
data.frame(lapply(
setNames(template, seq_along(template)),
function(v) c(NA,v)[(rowSums(x==v,na.rm=TRUE)>0)+1]
))
# X1 X2 X3 X4 X5
#1 Sleep Eat Domestic <NA> Child Care
#2 Sleep Eat Domestic Paid <NA>
#3 Sleep Eat Domestic <NA> Child Care
#4 Sleep Eat <NA> Paid <NA>
Or an alternative using pmax
:
data.frame(
lapply(
setNames(template, seq_along(template)),
function(v) do.call(pmax, c(replace(x, x != v,NA),na.rm=TRUE))
)
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With