Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dplyr pipe (%>%) within mutate()?

Tags:

r

dplyr

The piping in dplyr is cool and sometimes I want to clean up one column by applying multiple commands to it. Is there a way to use the pipe within the mutate() command? I notice this most when using regex and it comes up also in other contexts. In the example below, I can clearly see the different manipulations I am applying to the column "Clean" and I am curious if there is a way to do something that mimics %>% within mutate().

library(dplyr)
phone <- data.frame(Numbers = c("1234567890", "555-3456789", "222-222-2222",   
                                "5131831249", "123.321.1234","(333)444-5555",
                                "+1 123-223-3234", "555-666-7777 x100"), 
                                stringsAsFactors = F)

phone2 <- phone %>%
          mutate(Clean = gsub("[A-Za-z].*", "", Numbers), #remove extensions
                 Clean = gsub("[^0-9]", "", Clean),       #remove parentheses, dashes, etc
                 Clean = substr(Clean, nchar(Clean)-9, nchar(Clean)), #grab the right 10 characters
                 Clean = gsub("(^\\d{3})(\\d{3})(\\d{4}$)", "(\\1)\\2-\\3", Clean)) #format

phone2

I know there might be a better gsub() command but for the purposes of this question, I want to know if there is a way to pipe these gsub() elements together so that I don't have to keep writing Clean = gsub(...) but also not have to use the method where I embed these inside each other.

It would be fine with me if you answer this question using a simpler example.

like image 300
yake84 Avatar asked Nov 28 '22 00:11

yake84


2 Answers

Don't fall into the trap of endless pipes. Do the correct thing for readability and efficiency, write a function.

phone %>% mutate(Clean = cleanPhone(Numbers))
#             Numbers         Clean
# 1        1234567890 (123)456-7890
# 2       555-3456789 (555)345-6789
# 3      222-222-2222 (222)222-2222
# 4        5131831249 (513)183-1249
# 5      123.321.1234 (123)321-1234
# 6     (333)444-5555 (333)444-5555
# 7   +1 123-223-3234 (123)223-3234
# 8 555-666-7777 x100 (666)777-7100

Custom function:

cleanPhone <- function(x) {
  x2 <- gsub("[^0-9]", "", x)
  x3 <- substr(x2, nchar(x2)-9, nchar(x2))
  gsub("(^\\d{3})(\\d{3})(\\d{4}$)", "(\\1)\\2-\\3", x3)
}
like image 107
Pierre L Avatar answered Dec 04 '22 13:12

Pierre L


I guess you need

phone %>% 
     mutate(Clean = gsub("[A-Za-z].*", "", Numbers) %>%
                    gsub("[^0-9]", "", .) %>%
                    substr(., nchar(.)-9, nchar(.)) %>% 
                    gsub("(^\\d{3})(\\d{3})(\\d{4}$)", "(\\1)\\2-\\3", .))
#            Numbers         Clean
#1        1234567890 (123)456-7890
#2       555-3456789 (555)345-6789
#3      222-222-2222 (222)222-2222
#4        5131831249 (513)183-1249
#5      123.321.1234 (123)321-1234
#6     (333)444-5555 (333)444-5555
#7   +1 123-223-3234 (123)223-3234
#8 555-666-7777 x100 (555)666-7777
like image 35
akrun Avatar answered Dec 04 '22 11:12

akrun