Use stringr in R to find the remaining string after last substring [duplicate]

Tags:

How can I use str_match to extract the remaining string after the last substring.

For example, for the string "apples and oranges and bananas with cream", I'd like to extract the remainder of this string after the last occurrence of " and " to return "bananas and cream".

I have tried many alternatives to this command but it either keeps returning the remainder of the string after the first "and" or an empty string.

library(stringr)

str_match("apples and oranges and bananas with cream", "(?<= and ).*(?! and )")

    #     [,1]                             
    #[1,] "oranges and bananas with cream"

I've searched StackOverflow for solutions and found some for javascript, Python and base R but have found none for stringr package.

Thanks.

715

asked May 05 '18 01:05

James N

2 Answers

(Don't know about str_match. Base R regex should suffice, though.) Since regex pattern matching is "greedy", i.e. it will search for all of the matches and pick the last one, it's just:

sub("^.+and ", "", "apples and oranges and bananas with cream")
#[1] "bananas with cream"

I'm pretty sure there would be an equivalent in the "lubridate" corner of the hadleyverse.

Then failure with:

 library(lubridate)

Attaching package: ‘lubridate’

The following object is masked from ‘package:plyr’:

    here

The following objects are masked from ‘package:data.table’:

    hour, isoweek, mday, minute, month, quarter, second, wday, week, yday, year

The following object is masked from ‘package:base’:

    date

> str_replace("apples and oranges and bananas with cream", "^.+and ", "")
Error in str_replace("apples and oranges and bananas with cream", "^.+and ",  : 
  could not find function "str_replace"

So it's not in pkg:lubridate but rather in stringr (which as I understand it is a very light wrapper around the stringi package):

library(stringr)
 str_replace("apples and oranges and bananas with cream", "^.+and ", "")
[1] "bananas with cream"

I do wish that people who ask questions about non-base package functions would include a library call to give respondents a clue as to their working envirinment.

118

answered Oct 28 '22 08:10

IRTFM

Another simple approach is to use a variation of the *SKIP what's to avoid schema using capture groups, i.e. What_I_want_to_avoid|(What_I_want_to_match):

library(stringr)
s  <- "apples and oranges and bananas with cream"
str_match(s, "^.+and (.*)")[,2]

The key idea here is to completely disregard the overall matches returned by the regex engine: that's the trash bin. Instead, we only need to check capture group 1 through [,2], which, when set, contains what we are looking for. See also: http://www.rexegg.com/regex-best-trick.html#pseudoregex

We can do a similar thing using base R gsub-functions, e.g.

gsub("^.+and (.*)", "\\1", s, perl = TRUE)

PS: Unfortunately, we cannot use the What_I_want_to_avoid(*SKIP)(*FAIL)|What_I_want_to_match pattern with stringi/stringr functions since the referenced ICU regex library that does not include the (*SKIP)(*FAIL) verbs (they are only in PCRE available).

answered Oct 28 '22 09:10

wp78de

Related questions
                            
                                Rename columns using `starts_with()` where new prefix is a string
                            
                                dplyr: deselecting columns given by
                            
                                Convert number of days since Jan 1 2000 into date format
                            
                                reshape/melt an asymmetric matrix according to a rowKey
                            
                                is.atomic() vs is.vector()
                            
                                dplyr::select_if can use colnames and their values at the same time?
                            
                                Replace NA in all columns of a dplyr chain
                            
                                Get column names with zero variance using dplyr
                            
                                Extract city names from large text with R
                            
                                Extract portion of string startswith 4 digit number and ends with period
                            
                                Extract first sentence in string
                            
                                How to convert list of -sf dataframes into single dataframe with geometry per row in R?
                            
                                Getting Stargazer Column labels to print on two or three lines?
                            
                                Extract columns from data table by numeric indices stored in a vector
                            
                                R Violin plots and boxplots together, make fill behave differently only for boxplots
                            
                                Error in browseVignettes: no vignettes found
                            
                                Facet_Wrap labels in R
                            
                                Stuck with definition of S3 method for autoplot
                            
                                R does not report error when an argument of a function is not provided but used for subsetting a vector
                            
                                how to merge a shapefile with a dataframe with latitude/longitude data

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Use stringr in R to find the remaining string after last substring [duplicate]

Tags:

regex

r

stringr

James N

People also ask

2 Answers

IRTFM

wp78de

Recent Activity

Donate For Us