When answering this question, I wrote the following code:
df <- data.frame(Call_Num = c("HV5822.H4 C47 Circulating Collection, 3rd Floor", "QE511.4 .G53 1982 Circulating Collection, 3rd Floor", "TL515 .M63 Circulating Collection, 3rd Floor", "D753 .F4 Circulating Collection, 3rd Floor", "DB89.F7 D4 Circulating Collection, 3rd Floor"))
require(stringr)
matches = str_match(df$Call_Num, "([A-Z]+)(\\d+)\\s*\\.")
df2 <- data.frame(df, letter=matches[,2], number=matches[,3])
Now my question is: Is there a simple way to combine the last two lines into one dplyr
call, presumably using mutate()
? Alternatively, I'd interested in a solution with do()
as well. For the mutate()
approach, since we're extracting 2 groups, I'll take a solution that calls str_match()
twice with different regular expressions, one for each desired group.
Edit: To clarify, the main challenge I see here is that str_match
returns a matrix, and I'm wondering how to handle that in mutate()
or do()
. I'm not interested in solutions to the original problem using other methods of extracting the information. There are plenty of such solutions given already here.
You could do this with extract()
from the tidyr package:
extract(df, Call_Num, into = c("letter", "number"), regex = "([A-Z]+)(\\d+)\\s*\\.", remove = FALSE)
Call_Num letter number
1 HV5822.H4 C47 Circulating Collection, 3rd Floor HV 5822
2 QE511.4 .G53 1982 Circulating Collection, 3rd Floor QE 511
3 TL515 .M63 Circulating Collection, 3rd Floor TL 515
4 D753 .F4 Circulating Collection, 3rd Floor D 753
5 DB89.F7 D4 Circulating Collection, 3rd Floor DB 89
It's not dplyr, but as stated on the CRAN page linked above, tidyr "is designed specifically for data tidying (not general reshaping or aggregating) and works well with dplyr data pipelines."
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With