string = c("Hello-", "HelloA", "Helloa")
grep("Hello$[A-z]", string)
I wish to find the indices of the strings in which the next character after the word "Hello" is a letter (case insensitive). The code above doesn't work, but I would like grep() to return indices 2 and 3 since those words have a letter after "Hello"
End of String or Line: $ The $ anchor specifies that the preceding pattern must occur at the end of the input string, or before \n at the end of the input string. If you use $ with the RegexOptions. Multiline option, the match can also occur at the end of a line.
The caret ^ and dollar $ characters have special meaning in a regexp. They are called “anchors”. The caret ^ matches at the beginning of the text, and the dollar $ – at the end. The pattern ^Mary means: “string start and then Mary”.
\d (digit) matches any single digit (same as [0-9] ). The uppercase counterpart \D (non-digit) matches any single character that is not a digit (same as [^0-9] ). \s (space) matches any single whitespace (same as [ \t\n\r\f] , blank, tab, newline, carriage-return and form-feed).
The word boundary \b matches positions where one side is a word character (usually a letter, digit or underscore—but see below for variations across engines) and the other side is not a word character (for instance, it may be the beginning of the string or a space character).
Use Positive lookahead
> string = c("Hello-", "HelloA", "Helloa")
> grep('Hello(?=[A-Za-z])', string, perl=T)
[1] 2 3
(?=[A-Za-z])
this positive lookahead asserts that the character following the string Hello
must be a letter.
OR
> grep('Hello[A-Za-z]', string)
[1] 2 3
Add a $
in the regex if there is only one letter following the string Hello
. $
Asserts that we are at the end.
> grep('Hello[A-Za-z]$', string)
[1] 2 3
> grep('Hello(?=[A-Za-z]$)', string, perl=T)
[1] 2 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With