Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular Expression in R with a negative lookbehind

So I have the following data, let's say called "my_data":

Storm.Type
TYPHOON
SEVERE STORM
TROPICAL STORM
SNOWSTORM AND HIGH WINDS

What I want is to classify whether or not each element in my_data$Storm.Type is a storm, BUT I don't want to include tropical storms as storms (I'm going to classify them separately), such that I would have

Storm.Type                    Is.Storm
TYPHOON                       0
SEVERE STORM                  1
TROPICAL STORM                0
SNOWSTORM AND HIGH WINDS      1

I have written the following code:

my_data$Is.Storm  <-  my_data[grep("(?<!TROPICAL) (?i)STORM"), "Storm.Type"]

But this only returns the "SEVERE STORM" as a storm (but leaves out SNOWSTORM AND HIGH WINDS). Thank you!

like image 995
Jonathan Charlton Avatar asked Nov 22 '13 20:11

Jonathan Charlton


People also ask

What is negative Lookbehind regex?

In negative lookbehind the regex engine first finds a match for an item after that it traces back and tries to match a given item which is just before the main match. In case of a successful traceback match the match is a failure, otherwise it is a success.

Can I use negative Lookbehind?

The positive lookbehind ( (? <= ) ) and negative lookbehind ( (? <! ) ) zero-width assertions in JavaScript regular expressions can be used to ensure a pattern is preceded by another pattern.

What is Lookbehind in regex?

Introduction to the JavaScript regex lookbehind In regular expressions, a lookbehind matches an element if there is another specific element before it. A lookbehind has the following syntax: (?<=Y)X. In this syntax, the pattern match X if there is Y before it.

Does SED support Lookbehind?

I created a test using grep but it does not work in sed . This works correctly by returning bar . I was expecting footest as output, but it did not work. sed does not support lookaround assertions.


1 Answers

The problem is that you're looking for the string " STORM" with a preceding space, so "SNOWSTORM" does not qualify.

As a fix, consider moving the space into your negative lookbehind assertion, like so:

ss <- c("TYPHOON","SEVERE STORM","TROPICAL STORM","SNOWSTORM AND HIGH WINDS",
        "THUNDERSTORM")
grep("(?<!TROPICAL )(?i)STORM", ss, perl = TRUE)
# [1] 2 4 5
grepl("(?<!TROPICAL )(?i)STORM", ss, perl = TRUE)
# [1] FALSE  TRUE FALSE  TRUE  TRUE

I didn't know that (?i) and (?-i) set whether you ignore case or not in regex. Cool find. Another way to do it is the ignore.case flag:

grepl("(?<!tropical )storm", ss, perl = TRUE, ignore.case = TRUE)
# [1] FALSE  TRUE FALSE  TRUE  TRUE

Then define your column:

my_data$Is.Storm  <-  grepl("(?<!tropical )storm", my_data$Storm.Type,
                            perl = TRUE, ignore.case = TRUE)
like image 131
Blue Magister Avatar answered Sep 19 '22 12:09

Blue Magister