I have defined <pre class="prettyprint"><code>vec <- "5f 110y, Fast" </code></pre> and <pre class="prettyprint"><code>gsub("[\\s0-9a-z]+,", "", vec) </code></pre> gives "<code>5f Fast</code>" I would have expected it to give "<code>Fast</code>" since everything before the comma should get matched by the regex. Can anyone explain to me why this is not the case?

You should keep in mind that, in TRE regex patterns, you cannot use regex escapes like <code>\s</code>, <code>\d</code>, <code>\w</code> inside bracket expressions. So, the regex in your case, <code>"[\\s0-9a-z]+,"</code>, matches 1 or more <code>\</code>, <code>s</code>, digits and lowercase ASCII letters, and then a single <code>,</code>. You may use POSIX character classes instead, like <code>[:space:]</code> (any whitespaces) or <code>[:blank:]</code> (horizontal whitespaces): <pre class="prettyprint"><code>> gsub("[[:space:]0-9a-z]+,", "", vec) [1] " Fast" </code></pre> Or, use a PCRE regex with <code>\s</code> and <code>perl=TRUE</code> argument: <pre class="prettyprint"><code>> gsub("[\\s0-9a-z]+,", "", vec, perl=TRUE) [1] " Fast" </code></pre> To make <code>\s</code> match all Unicode whitespaces, add <code>(*UCP)</code> PCRE verb at the pattern start: <code>gsub("(*UCP)[\\s0-9a-z]+,", "", vec, perl=TRUE)</code>.

Could you please try folllowing and let me know if this helps you. <pre class="prettyprint"><code>vec <- c("5f 110y, Fast") gsub(".*,","",vec) </code></pre> OR <pre class="prettyprint"><code>gsub("[[:alnum:]]+ [[:alnum:]]+,","",vec) </code></pre>

Using shorthand character classes inside character classes in R regex

Tags:

regex

r

gsub

I have defined

vec <- "5f 110y, Fast"

and

gsub("[\\s0-9a-z]+,", "", vec)

gives "5f Fast"

I would have expected it to give "Fast" since everything before the comma should get matched by the regex.

Can anyone explain to me why this is not the case?

226

asked Jul 19 '18 11:07

ThanksABundle

2 Answers

You should keep in mind that, in TRE regex patterns, you cannot use regex escapes like \s, \d, \w inside bracket expressions.

So, the regex in your case, "[\\s0-9a-z]+,", matches 1 or more \, s, digits and lowercase ASCII letters, and then a single ,.

You may use POSIX character classes instead, like [:space:] (any whitespaces) or [:blank:] (horizontal whitespaces):

> gsub("[[:space:]0-9a-z]+,", "", vec)
[1] " Fast"

Or, use a PCRE regex with \s and perl=TRUE argument:

> gsub("[\\s0-9a-z]+,", "", vec, perl=TRUE)
[1] " Fast"

To make \s match all Unicode whitespaces, add (*UCP) PCRE verb at the pattern start: gsub("(*UCP)[\\s0-9a-z]+,", "", vec, perl=TRUE).

188

answered Sep 18 '22 14:09

Wiktor Stribiżew

Could you please try folllowing and let me know if this helps you.

vec <- c("5f 110y, Fast")
gsub(".*,","",vec)

gsub("[[:alnum:]]+ [[:alnum:]]+,","",vec)

answered Sep 20 '22 14:09

RavinderSingh13

Related questions
                            
                                specify dplyr column names [duplicate]
                            
                                Why does facet_grid work, but not facet_wrap?
                            
                                Why is a length one vector initially at NAM(2)?
                            
                                Where to put external files for testthat tests
                            
                                How achieve identical facet sizes and scales in several multi-facet ggplot2 graphics?
                            
                                Parse Error: "Trailing Garbage" while trying to parse JSON column in data frame
                            
                                dplyr rename - Error: `new_name` = old_name must be a symbol or a string, not formula
                            
                                ggplot2: raster plotting does not work as expected when setting alpha values
                            
                                Want only the time portion of a date-time object in R
                            
                                Passing missing argument from function to function in R
                            
                                Speed up lmer function in R
                            
                                Find and break on repeated runs
                            
                                Cannot install R packages in Jupyter Notebook
                            
                                force R to always display package name in error message
                            
                                Package Development: Multiple files or Single File
                            
                                What R package or code can be used to build custom-made GUIs?
                            
                                Where can I find documentation on the `..*..` ggplot options?
                            
                                Difficulty annotating plot when x axis values are dates
                            
                                How can I hide the documentation of helper functions?
                            
                                Linking to a tab or panel of a shiny app

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With