I have a large data frame with unknown column names and numeric values 1, 2, 3, or 4. Now I want to replace all 4 values with it's column name and all 1, 2 and 3's with an empty value.
Ofcourse I can make a loop of some kind, like this:
df <- data.frame(id=1:8,unknownvarname1=c(1:4,1:4),unknownvarname2=c(4:1,4:1))
for (i in 2:length(df)){
df[,i] <- as.character(df[,i])
df[,i] <- mgsub::mgsub(df[,i],c(1,2,3,4),c("","","",names(df)[i]))
}
This would be the result:
id unknownvarname1 unknownvarname2
1 1 unknownvarname2
2 2
3 3
4 4 unknownvarname1
5 5 unknownvarname2
6 6
7 7
8 8 unknownvarname1 unknownvarname2
For a data frame this size that's no problem at all. But when I try this loop on large data frames with up to 30k and up to 40 uknown variables, the loop takes ages to complete.
Does anyone know of a faster way to do this? I tried functions like mutate()
of dplyr package
but I could not manage to make it work.
Many thanks in advance!
One way using base R
#Replace all the values with 1:3 with blank
df[-1][sapply(df[-1], `%in%`, 1:3)] <- ""
#Get the row/column indices where value is 4
mat <- which(df == 4, arr.ind = TRUE)
#Exclude values from first column
mat <- mat[mat[, 2] != 1, ]
#Replace remaining entries with it's corresponding column names
df[mat] <- names(df)[mat[, 2]]
df
# id unknownvarname1 unknownvarname2
#1 1 unknownvarname2
#2 2
#3 3
#4 4 unknownvarname1
#5 5 unknownvarname2
#6 6
#7 7
#8 8 unknownvarname1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With