How can I use str_match to extract the remaining string after the last substring.
For example, for the string "apples and oranges and bananas with cream", I'd like to extract the remainder of this string after the last occurrence of " and " to return "bananas and cream".
I have tried many alternatives to this command but it either keeps returning the remainder of the string after the first "and" or an empty string.
library(stringr)
str_match("apples and oranges and bananas with cream", "(?<= and ).*(?! and )")
# [,1]
#[1,] "oranges and bananas with cream"
I've searched StackOverflow for solutions and found some for javascript, Python and base R but have found none for stringr package.
Thanks.
R provides different ways to find substrings. These are: Find substring in R using substr () method in R Programming is used to find the sub-string from starting index to the ending index values in a string. Return: Returns the sub string from a given string using indexes.
Strings are not glamorous, high-profile components of R, but they do play a big role in many data cleaning and preparation tasks. The stringr package provide a cohesive set of functions designed to make working with strings as easy as possible.
If you want to replace a substring with a string with different length, you might have a look at the gsub function. However, let’s move on to the next example. Another difference between substr and substring is the possibility to extract several substrings with one line of code. With substr, this is not possible.
In below example we have used str_sub () function to find last n characters of the column in R. str_sub () function takes column name, number of characters from last with minus symbol. Extract first word of the column with str_extract () function along with regular expression is shown below
(Don't know about str_match
. Base R regex should suffice, though.) Since regex pattern matching is "greedy", i.e. it will search for all of the matches and pick the last one, it's just:
sub("^.+and ", "", "apples and oranges and bananas with cream")
#[1] "bananas with cream"
I'm pretty sure there would be an equivalent in the "lubridate" corner of the hadleyverse.
Then failure with:
library(lubridate)
Attaching package: ‘lubridate’
The following object is masked from ‘package:plyr’:
here
The following objects are masked from ‘package:data.table’:
hour, isoweek, mday, minute, month, quarter, second, wday, week, yday, year
The following object is masked from ‘package:base’:
date
> str_replace("apples and oranges and bananas with cream", "^.+and ", "")
Error in str_replace("apples and oranges and bananas with cream", "^.+and ", :
could not find function "str_replace"
So it's not in pkg:lubridate
but rather in stringr
(which as I understand it is a very light wrapper around the stringi package):
library(stringr)
str_replace("apples and oranges and bananas with cream", "^.+and ", "")
[1] "bananas with cream"
I do wish that people who ask questions about non-base package functions would include a library
call to give respondents a clue as to their working envirinment.
Another simple approach is to use a variation of the *SKIP what's to avoid schema using capture groups, i.e. What_I_want_to_avoid|(What_I_want_to_match)
:
library(stringr)
s <- "apples and oranges and bananas with cream"
str_match(s, "^.+and (.*)")[,2]
The key idea here is to completely disregard the overall matches returned by the regex engine: that's the trash bin. Instead, we only need to check capture group 1 through [,2]
, which, when set, contains what we are looking for. See also:
http://www.rexegg.com/regex-best-trick.html#pseudoregex
We can do a similar thing using base R gsub
-functions, e.g.
gsub("^.+and (.*)", "\\1", s, perl = TRUE)
PS: Unfortunately, we cannot use the What_I_want_to_avoid(*SKIP)(*FAIL)|What_I_want_to_match
pattern with stringi/stringr functions since the referenced ICU regex library that does not include the (*SKIP)(*FAIL)
verbs (they are only in PCRE available).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With