I trying to extract a substring by pattern using gsub() R function. <pre class="prettyprint"><code># Example: extracting "7 years" substring. string <- "Psychologist - 7 years on the website, online" gsub(pattern="[0-9]+\\s+\\w+", replacement="", string)` `[1] "Psychologist - on the website, online" </code></pre> As you can see, it's easy to exlude needed substring using gsub(), but I need to inverse the result and getting "7 years" only. I think about using "^", something like that: <code>gsub(pattern="[^[0-9]+\\s+\\w+]", replacement="", string)</code> Please, could anyone help me with correct regexp pattern?

You may use <pre class="prettyprint"><code>sub(pattern=".*?([0-9]+\\s+\\w+).*", replacement="\\1", string) </code></pre> See this R demo. Details <ul> <li> <code>.*?</code> - any 0+ chars, as few as possible</li> <li> <code>([0-9]+\\s+\\w+)</code> - Capturing group 1: <ul> <li> <code>[0-9]+</code> - one or more digits</li> <li> <code>\\s+</code> - 1 or more whitespaces</li> <li> <code>\\w+</code> - 1 or more word chars</li> </ul> </li> <li> <code>.*</code> - the rest of the string (any 0+ chars, as many as possible)</li> </ul> The <code>\1</code> in the replacement replaces with the contents of Group 1.

You could use the opposite of <code>\d</code>, which is <code>\D</code> in <code>R</code>: <pre class="prettyprint"><code>string <- "Psychologist - 7 years on the website, online" sub(pattern = "\\D*(\\d+\\s+\\w+).*", replacement = "\\1", string) # [1] "7 years" </code></pre> <code>\D*</code> means: no digits as long as possible, the rest is captured in a group and then replaces the complete string. See a demo on regex101.com.

How to extract a substring by inverse pattern with R?

Tags:

string

regex

r

I trying to extract a substring by pattern using gsub() R function.

Click to copy

# Example: extracting "7 years" substring.
string <- "Psychologist - 7 years on the website, online"
gsub(pattern="[0-9]+\\s+\\w+", replacement="", string)`

`[1] "Psychologist -  on the website, online"

As you can see, it's easy to exlude needed substring using gsub(), but I need to inverse the result and getting "7 years" only. I think about using "^", something like that:

gsub(pattern="[^[0-9]+\\s+\\w+]", replacement="", string)

Please, could anyone help me with correct regexp pattern?

807

asked Oct 26 '17 10:10

Michael

2 Answers

You may use

Click to copy

sub(pattern=".*?([0-9]+\\s+\\w+).*", replacement="\\1", string)

See this R demo.

Details

.*? - any 0+ chars, as few as possible
([0-9]+\\s+\\w+) - Capturing group 1:
- [0-9]+ - one or more digits
- \\s+ - 1 or more whitespaces
- \\w+ - 1 or more word chars
.* - the rest of the string (any 0+ chars, as many as possible)

The \1 in the replacement replaces with the contents of Group 1.

answered Sep 28 '22 10:09

Wiktor Stribiżew

You could use the opposite of \d, which is \D in R:

Click to copy

string <- "Psychologist - 7 years on the website, online"
sub(pattern = "\\D*(\\d+\\s+\\w+).*", replacement = "\\1", string)
# [1] "7 years"

\D* means: no digits as long as possible, the rest is captured in a group and then replaces the complete string.

See a demo on regex101.com.

answered Sep 28 '22 09:09

Jan

Related questions
                            
                                R: how to filter a timestamp by hour and minute?
                            
                                Set title/header in Shiny Dashboard
                            
                                what is the different between h2o.ensemble and h2o.stack in package h2oEnsemble
                            
                                How to customize title, axis labels, etc. in a plot of a decomposed time series
                            
                                remove vectors which are subsets of other vectors in a list
                            
                                Change font in Wordcloud package R
                            
                                error in plm regression
                            
                                tidyr - spread multiple columns
                            
                                Duplicate hover Info in plotly with ggplot2
                            
                                How to define color of intersection in a Venn diagram?
                            
                                sqldf can't find the data with error "no such table"
                            
                                Flattening lists nested in data.frames
                            
                                Using n() at the same time as calculating other summary statistics
                            
                                Quanteda: how to remove my own list of words
                            
                                Error in parallel R: Error in serialize(data, node$con) : error writing to connection
                            
                                Compile a vignette using `devtools::build_vignette` so that .md is kept in the vignettes directory
                            
                                How to select certain geometries from a geometrycollection after st_intersect?
                            
                                How can I keep all tick marks but remove most grid lines on the x axis?
                            
                                Means multiple columns by multiple groups [duplicate]
                            
                                read a csv file in a zipped folder with R without unzipping

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to extract a substring by inverse pattern with R?

Tags:

string

regex

r

Michael

People also ask

2 Answers

Wiktor Stribiżew

Jan

Recent Activity

Donate For Us