So I have the following data, let's say called "my_data": <pre class="prettyprint"><code>Storm.Type TYPHOON SEVERE STORM TROPICAL STORM SNOWSTORM AND HIGH WINDS </code></pre> What I want is to classify whether or not each element in my_data$Storm.Type is a storm, BUT I don't want to include tropical storms as storms (I'm going to classify them separately), such that I would have <pre class="prettyprint"><code>Storm.Type Is.Storm TYPHOON 0 SEVERE STORM 1 TROPICAL STORM 0 SNOWSTORM AND HIGH WINDS 1 </code></pre> I have written the following code: <pre class="prettyprint"><code>my_data$Is.Storm <- my_data[grep("(?<!TROPICAL) (?i)STORM"), "Storm.Type"] </code></pre> But this only returns the "SEVERE STORM" as a storm (but leaves out SNOWSTORM AND HIGH WINDS). Thank you!

The problem is that you're looking for the string <code>" STORM"</code> with a preceding space, so <code>"SNOWSTORM"</code> does not qualify. As a fix, consider moving the space into your negative lookbehind assertion, like so: <pre class="prettyprint"><code>ss <- c("TYPHOON","SEVERE STORM","TROPICAL STORM","SNOWSTORM AND HIGH WINDS", "THUNDERSTORM") grep("(?<!TROPICAL )(?i)STORM", ss, perl = TRUE) # [1] 2 4 5 grepl("(?<!TROPICAL )(?i)STORM", ss, perl = TRUE) # [1] FALSE TRUE FALSE TRUE TRUE </code></pre> I didn't know that <code>(?i)</code> and <code>(?-i)</code> set whether you ignore case or not in regex. Cool find. Another way to do it is the <code>ignore.case</code> flag: <pre class="prettyprint"><code>grepl("(?<!tropical )storm", ss, perl = TRUE, ignore.case = TRUE) # [1] FALSE TRUE FALSE TRUE TRUE </code></pre> Then define your column: <pre class="prettyprint"><code>my_data$Is.Storm <- grepl("(?<!tropical )storm", my_data$Storm.Type, perl = TRUE, ignore.case = TRUE) </code></pre>

Regular Expression in R with a negative lookbehind

Q: What is Lookbehind in regex?

Introduction to the JavaScript regex lookbehind In regular expressions, a lookbehind matches an element if there is another specific element before it. A lookbehind has the following syntax: (?<=Y)X. In this syntax, the pattern match X if there is Y before it.

Q: Does SED support Lookbehind?

I created a test using grep but it does not work in sed . This works correctly by returning bar . I was expecting footest as output, but it did not work. sed does not support lookaround assertions.

Tags:

regex

r

negative-lookbehind

So I have the following data, let's say called "my_data":

Storm.Type
TYPHOON
SEVERE STORM
TROPICAL STORM
SNOWSTORM AND HIGH WINDS

What I want is to classify whether or not each element in my_data$Storm.Type is a storm, BUT I don't want to include tropical storms as storms (I'm going to classify them separately), such that I would have

Storm.Type                    Is.Storm
TYPHOON                       0
SEVERE STORM                  1
TROPICAL STORM                0
SNOWSTORM AND HIGH WINDS      1

I have written the following code:

my_data$Is.Storm  <-  my_data[grep("(?<!TROPICAL) (?i)STORM"), "Storm.Type"]

But this only returns the "SEVERE STORM" as a storm (but leaves out SNOWSTORM AND HIGH WINDS). Thank you!

995

asked Nov 22 '13 20:11

Jonathan Charlton

1 Answers

The problem is that you're looking for the string " STORM" with a preceding space, so "SNOWSTORM" does not qualify.

As a fix, consider moving the space into your negative lookbehind assertion, like so:

ss <- c("TYPHOON","SEVERE STORM","TROPICAL STORM","SNOWSTORM AND HIGH WINDS",
        "THUNDERSTORM")
grep("(?<!TROPICAL )(?i)STORM", ss, perl = TRUE)
# [1] 2 4 5
grepl("(?<!TROPICAL )(?i)STORM", ss, perl = TRUE)
# [1] FALSE  TRUE FALSE  TRUE  TRUE

I didn't know that (?i) and (?-i) set whether you ignore case or not in regex. Cool find. Another way to do it is the ignore.case flag:

grepl("(?<!tropical )storm", ss, perl = TRUE, ignore.case = TRUE)
# [1] FALSE  TRUE FALSE  TRUE  TRUE

Then define your column:

my_data$Is.Storm  <-  grepl("(?<!tropical )storm", my_data$Storm.Type,
                            perl = TRUE, ignore.case = TRUE)

131

answered Sep 19 '22 12:09

Blue Magister

Related questions
                            
                                Writing a regex to detect repeat-characters [duplicate]
                            
                                strsplit inconsistent with gregexpr
                            
                                Firefox error: Unable to check input because the pattern is not a valid regexp: invalid identity escape in regular expression
                            
                                Python 2 and 3 're.sub' inconsistency
                            
                                ERROR: Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$"
                            
                                Reverse regular expressions to generate data
                            
                                Python finding substring between certain characters using regex and replace()
                            
                                Converting part of a string to upper case in CMake
                            
                                MySQL REGEXP query - accent insensitive search
                            
                                Angular 2 reactive form validate pattern for number with two decimal places [duplicate]
                            
                                regex: boost::xpressive vs boost::regex
                            
                                Improving with Regex Exercises [closed]
                            
                                Sed using extended regexp and capture groups
                            
                                Removing data between double squiggly brackets with nested sub brackets in python
                            
                                How to replace part of a string using regex
                            
                                RedirectMatch how to match any words but not index and nothing
                            
                                Is there a MySQL equivalent of PHP's preg_replace?
                            
                                How to replace text URLs and exclude URLs in HTML tags?
                            
                                Why does Ruby /[[:punct:]]/ miss some punctuation characters?
                            
                                form validation allow only english alphabet characters

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With