Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R extract substring from end of pattern until first occurance of character

Tags:

regex

r

gsub

Struggling for hours to get this match and replace in R gsub to work and still no success. I'm trying to match the pattern "Reason:" in a string, and extact everything AFTER this pattern and until the first occurance of a dot (.) For instance:

Offer Disposition. MSISDN: 7183067962. Offer: . Disposition: DECLINED. Reason: Not interested. ChannelID: CARE.

would return "Not interested"

like image 416
user3424107 Avatar asked Mar 15 '14 19:03

user3424107


People also ask

How do I extract a specific word from a string in R?

To extract words from a string vector, we can use word function of stringr package. For example, if we have a vector called x that contains 100 words then first 20 words can be extracted by using the command word(x,start=1,end=20,sep=fixed(" ")).

How do I extract a value from a string in R?

Substring() function in R is widely used to either extract the characters present in the data or to manipulate the data. You can easily extract the required characters from a string and also replace the values in a string.

How do I string a character before R?

If you want to extract all the characters from a string before a character, you can use the substr() or substring() functions.


2 Answers

Here's a solution:

s <- "Offer Disposition. MSISDN: 7183067962. Offer: . Disposition: DECLINED. Reason: Not interested. ChannelID: CARE."

sub(".*Reason: (.*?)\\..*", "\\1", s)
# [1] "Not interested"

Update (to address comments):

If you also have strings that do not match the pattern, I recommend using regexpr instead of sub:

s2 <- c("no match example",
        "Offer Disposition. MSISDN: 7183067962. Offer: . Disposition: DECLINED. Reason: Not interested. ChannelID: CARE.")

match <- regexpr("(?<=Reason: ).*?(?=\\.)", s2, perl = TRUE)
ifelse(match == -1, NA, regmatches(s2, match))
# [1] NA                                "Not interested. ChannelID: CARE"

For you second example, you can use the following regex:

s3 <- "Delete Payment Arrangement of type Proof of Payment for BAN : 907295267 on date 02/01/2014, from reason PAERR."

# a)
sub(".*type (.*?) for.*", "\\1", s3)
# [1] "Proof of Payment"

# b)
match <- regexpr("(?<=type ).*?(?= for)", s3, perl = TRUE)
ifelse(match == -1, NA, regmatches(s3, match))
# [1] "Proof of Payment"
like image 165
Sven Hohenstein Avatar answered Nov 30 '22 05:11

Sven Hohenstein


Lots of different ways (as you can see from the submissions). I personally like to use stringr functions.

library(stringr)

rec <- "Offer Disposition. MSISDN: 7183067962. Offer: . Disposition: DECLINED. Reason: Not interested. ChannelID: CARE."
str_match(rec, "Reason: ([a-zA-Z0-9\ ]+)\\.")[2]
## [1] "Not interested"
like image 41
hrbrmstr Avatar answered Nov 30 '22 05:11

hrbrmstr