So I have the following data, let's say called "my_data":
Storm.Type
TYPHOON
SEVERE STORM
TROPICAL STORM
SNOWSTORM AND HIGH WINDS
What I want is to classify whether or not each element in my_data$Storm.Type is a storm, BUT I don't want to include tropical storms as storms (I'm going to classify them separately), such that I would have
Storm.Type Is.Storm
TYPHOON 0
SEVERE STORM 1
TROPICAL STORM 0
SNOWSTORM AND HIGH WINDS 1
I have written the following code:
my_data$Is.Storm <- my_data[grep("(?<!TROPICAL) (?i)STORM"), "Storm.Type"]
But this only returns the "SEVERE STORM" as a storm (but leaves out SNOWSTORM AND HIGH WINDS). Thank you!
In negative lookbehind the regex engine first finds a match for an item after that it traces back and tries to match a given item which is just before the main match. In case of a successful traceback match the match is a failure, otherwise it is a success.
The positive lookbehind ( (? <= ) ) and negative lookbehind ( (? <! ) ) zero-width assertions in JavaScript regular expressions can be used to ensure a pattern is preceded by another pattern.
Introduction to the JavaScript regex lookbehind In regular expressions, a lookbehind matches an element if there is another specific element before it. A lookbehind has the following syntax: (?<=Y)X. In this syntax, the pattern match X if there is Y before it.
I created a test using grep but it does not work in sed . This works correctly by returning bar . I was expecting footest as output, but it did not work. sed does not support lookaround assertions.
The problem is that you're looking for the string " STORM"
with a preceding space, so "SNOWSTORM"
does not qualify.
As a fix, consider moving the space into your negative lookbehind assertion, like so:
ss <- c("TYPHOON","SEVERE STORM","TROPICAL STORM","SNOWSTORM AND HIGH WINDS",
"THUNDERSTORM")
grep("(?<!TROPICAL )(?i)STORM", ss, perl = TRUE)
# [1] 2 4 5
grepl("(?<!TROPICAL )(?i)STORM", ss, perl = TRUE)
# [1] FALSE TRUE FALSE TRUE TRUE
I didn't know that (?i)
and (?-i)
set whether you ignore case or not in regex. Cool find. Another way to do it is the ignore.case
flag:
grepl("(?<!tropical )storm", ss, perl = TRUE, ignore.case = TRUE)
# [1] FALSE TRUE FALSE TRUE TRUE
Then define your column:
my_data$Is.Storm <- grepl("(?<!tropical )storm", my_data$Storm.Type,
perl = TRUE, ignore.case = TRUE)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With