Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replacing strings with lookup table dplyr

Tags:

string

r

dplyr

I am trying to create a look up table in R in order to get my data in the same format as the company that I am working for.

It regards different education categories that I want to merge using dplyr.

library(dplyr)

# Create data
education <- c("Mechanichal Engineering","Electric Engineering","Political Science","Economics")

    data <- data.frame(X1=replicate(1,sample(education,1000,rep=TRUE)))

    tbl_df(data)

    # Create lookup table
    lut <- c("Mechanichal Engineering" = "Engineering",
             "Electric Engineering" = "Engineering",
             "Political Science" = "Social Science",
             "Economics" = "Social Science")

    # Assign lookup table
    data$X1 <- lut[data$X1]

But in my output my old values are replace with the wrong ones, i.e. not the ones that I created in the lookup table. Rather it seems like the lookup table is assign randomly.

like image 841
FilipW Avatar asked Jul 08 '15 14:07

FilipW


2 Answers

I've just been trying to figure this out myself. I wasn't quite happy with most of the solutions I found, so here's what I ended up with. I added an "other" category to show that it works even if there are values not defined in the lookup table.

library(dplyr)

# Create data
education <- c("Mechanichal Engineering",
               "Electric Engineering",
               "Political Science",
               "Economics",
               "Other")

data <- data.frame(X1 = replicate(1, sample(education, 20, rep=TRUE)))

# Create lookup table
lut <- c("Mechanichal Engineering" = "Engineering",
         "Electric Engineering" = "Engineering",
         "Political Science" = "Social Science",
         "Economics" = "Social Science")

data %>%
    mutate(X2 = recode(X1, !!!lut))
#>                         X1             X2
#> 1     Electric Engineering    Engineering
#> 2                    Other          Other
#> 3                    Other          Other
#> 4                    Other          Other
#> 5                    Other          Other
#> 6        Political Science Social Science
#> 7                    Other          Other
#> 8                Economics Social Science
#> 9        Political Science Social Science
#> 10    Electric Engineering    Engineering
#> 11               Economics Social Science
#> 12               Economics Social Science
#> 13 Mechanichal Engineering    Engineering
#> 14               Economics Social Science
#> 15       Political Science Social Science
#> 16                   Other          Other
#> 17                   Other          Other
#> 18                   Other          Other
#> 19 Mechanichal Engineering    Engineering
#> 20       Political Science Social Science
like image 198
Oliver Avatar answered Nov 01 '22 05:11

Oliver


education <- c("Mechanichal Engineering","Electric Engineering","Political Science","Economics")
lut <- list("Mechanichal Engineering" = "Engineering",
            "Electric Engineering" = "Engineering",
            "Political Science" = "Social Science",
            "Economics" = "Social Science")
lut2<-melt(lut)
data1 <- data.frame(X1=replicate(1,sample(education,1000,rep=TRUE)))
data1$new <- lut2[match(data1$X1,lut2$L1),'value']
head(data1)


=======================  ==============
X1                       new           
=======================  ==============
Political Science        Social Science
Political Science        Social Science
Mechanichal Engineering  Engineering   
Mechanichal Engineering  Engineering   
Political Science        Social Science
Political Science        Social Science
=======================  ==============
like image 2
Carl Boneri Avatar answered Nov 01 '22 03:11

Carl Boneri