Regular expression matching inside dplyr

Question

When answering this question, I wrote the following code:

df <- data.frame(Call_Num = c("HV5822.H4 C47 Circulating Collection, 3rd Floor", "QE511.4 .G53 1982 Circulating Collection, 3rd Floor", "TL515 .M63 Circulating Collection, 3rd Floor", "D753 .F4 Circulating Collection, 3rd Floor", "DB89.F7 D4 Circulating Collection, 3rd Floor"))

require(stringr)

matches = str_match(df$Call_Num, "([A-Z]+)(\d+)\s*\.")
df2 <- data.frame(df, letter=matches[,2], number=matches[,3])

Now my question is: Is there a simple way to combine the last two lines into one dplyr call, presumably using mutate()? Alternatively, I'd interested in a solution with do() as well. For the mutate() approach, since we're extracting 2 groups, I'll take a solution that calls str_match() twice with different regular expressions, one for each desired group.

Edit: To clarify, the main challenge I see here is that str_match returns a matrix, and I'm wondering how to handle that in mutate() or do(). I'm not interested in solutions to the original problem using other methods of extracting the information. There are plenty of such solutions given already here.

Sam Firke · Accepted Answer

You could do this with extract() from the tidyr package:

extract(df, Call_Num, into = c("letter", "number"), regex = "([A-Z]+)(\d+)\s*\.", remove = FALSE)

                                             Call_Num letter number
1     HV5822.H4 C47 Circulating Collection, 3rd Floor     HV   5822
2 QE511.4 .G53 1982 Circulating Collection, 3rd Floor     QE    511
3        TL515 .M63 Circulating Collection, 3rd Floor     TL    515
4          D753 .F4 Circulating Collection, 3rd Floor      D    753
5        DB89.F7 D4 Circulating Collection, 3rd Floor     DB     89

It's not dplyr, but as stated on the CRAN page linked above, tidyr "is designed specifically for data tidying (not general reshaping or aggregating) and works well with dplyr data pipelines."

Regular expression matching inside dplyr

Tags:

regex

r

dplyr

stringr

Claus Wilke

1 Answers

Sam Firke

Recent Activity

Donate For Us

Regular expression matching inside dplyr

Tags:

regex

r

dplyr

stringr

Claus Wilke

1 Answers

Sam Firke

Related questions

Recent Activity

Donate For Us