I am trying to use dplyr in R to extract substrings after a variable string in a dataframe filtered by certain instances of the variable name
in the example below. I am trying to pass the desired result into a new variable called income_rent
.
I am new to regular expressions. My attempt to do this is:
income_cashrent <- v18 %>%
filter(str_detect(name, "B25122")) %>%
mutate(income_rent = str_extract(label, "[^--!!]*$"))
However, I get the result:
Error in stri_extract_first_regex(string, pattern, opts_regex = opts(pattern)) : Syntax error in regexp pattern. (U_REGEX_RULE_SYNTAX)
The first four lines of name
are:
Estimate!!Total
Estimate!!Total!!Household income in the past 12 months (in 2018 inflation-adjusted dollars) --!!Less than $10,000
Estimate!!Total!!Household income in the past 12 months (in 2018 inflation-adjusted dollars) --!!Less than $10,000!!With cash rent
Estimate!!Total!!Household income in the past 12 months (in 2018 inflation-adjusted dollars) --!!Less than $10,000!!With cash rent!!Less than $100
The desired result would be:
[not sure how to indicate an empty result here]
Less than $10,000
Less than $10,000!!With cash rent
Less than $10,000!!With cash rent!!Less than $100
I have been thus far unable to debug this, consulting other regex examples on stack. Any guidance would be most welcome. Thanks all in advance!
We can use str_extract to extract the characters after the pattern
--!!` using regex lookaround
library(stringr)
library(dplyr)
v18 %>%
mutate(income_rent = str_extract(label, "(?<=--!!).*")) label
#1 Estimate!!Total
#2 Estimate!!Total!!Household income in the past 12 months (in 2018 inflation-adjusted dollars) --!!Less than $10,000
#3 Estimate!!Total!!Household income in the past 12 months (in 2018 inflation-adjusted dollars) --!!Less than $10,000!!With cash rent
#4 Estimate!!Total!!Household income in the past 12 months (in 2018 inflation-adjusted dollars) --!!Less than $10,000!!With cash rent!!Less than $100
# income_rent
#1 <NA>
#2 Less than $10,000
#3 Less than $10,000!!With cash rent
#4 Less than $10,000!!With cash rent!!Less than $100
Or another option is str_match
v18$income_rent <- str_match(v18$label, ".*--!!(.*)")[,2]
v18 <- structure(list(label = c("Estimate!!Total", "Estimate!!Total!!Household income in the past 12 months (in 2018 inflation-adjusted dollars) --!!Less than $10,000",
"Estimate!!Total!!Household income in the past 12 months (in 2018 inflation-adjusted dollars) --!!Less than $10,000!!With cash rent",
"Estimate!!Total!!Household income in the past 12 months (in 2018 inflation-adjusted dollars) --!!Less than $10,000!!With cash rent!!Less than $100"
)), class = "data.frame", row.names = c(NA, -4L))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With