Struggling for hours to get this match and replace in R gsub
to work and still no success.
I'm trying to match the pattern "Reason:"
in a string, and extact everything AFTER this pattern and until the first occurance of a dot (.
)
For instance:
Offer Disposition. MSISDN: 7183067962. Offer: . Disposition: DECLINED. Reason: Not interested. ChannelID: CARE.
would return "Not interested"
To extract words from a string vector, we can use word function of stringr package. For example, if we have a vector called x that contains 100 words then first 20 words can be extracted by using the command word(x,start=1,end=20,sep=fixed(" ")).
Substring() function in R is widely used to either extract the characters present in the data or to manipulate the data. You can easily extract the required characters from a string and also replace the values in a string.
If you want to extract all the characters from a string before a character, you can use the substr() or substring() functions.
Here's a solution:
s <- "Offer Disposition. MSISDN: 7183067962. Offer: . Disposition: DECLINED. Reason: Not interested. ChannelID: CARE."
sub(".*Reason: (.*?)\\..*", "\\1", s)
# [1] "Not interested"
Update (to address comments):
If you also have strings that do not match the pattern, I recommend using regexpr
instead of sub
:
s2 <- c("no match example",
"Offer Disposition. MSISDN: 7183067962. Offer: . Disposition: DECLINED. Reason: Not interested. ChannelID: CARE.")
match <- regexpr("(?<=Reason: ).*?(?=\\.)", s2, perl = TRUE)
ifelse(match == -1, NA, regmatches(s2, match))
# [1] NA "Not interested. ChannelID: CARE"
For you second example, you can use the following regex:
s3 <- "Delete Payment Arrangement of type Proof of Payment for BAN : 907295267 on date 02/01/2014, from reason PAERR."
# a)
sub(".*type (.*?) for.*", "\\1", s3)
# [1] "Proof of Payment"
# b)
match <- regexpr("(?<=type ).*?(?= for)", s3, perl = TRUE)
ifelse(match == -1, NA, regmatches(s3, match))
# [1] "Proof of Payment"
Lots of different ways (as you can see from the submissions). I personally like to use stringr
functions.
library(stringr)
rec <- "Offer Disposition. MSISDN: 7183067962. Offer: . Disposition: DECLINED. Reason: Not interested. ChannelID: CARE."
str_match(rec, "Reason: ([a-zA-Z0-9\ ]+)\\.")[2]
## [1] "Not interested"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With