Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: regular expression to specify end of string char is a letter

Tags:

regex

r

    string = c("Hello-", "HelloA", "Helloa")
    grep("Hello$[A-z]", string)

I wish to find the indices of the strings in which the next character after the word "Hello" is a letter (case insensitive). The code above doesn't work, but I would like grep() to return indices 2 and 3 since those words have a letter after "Hello"

like image 357
Adrian Avatar asked Nov 08 '14 05:11

Adrian


People also ask

How do you specify the end of a string in regex?

End of String or Line: $ The $ anchor specifies that the preceding pattern must occur at the end of the input string, or before \n at the end of the input string. If you use $ with the RegexOptions. Multiline option, the match can also occur at the end of a line.

Which character is used to indicate the end of string regex?

The caret ^ and dollar $ characters have special meaning in a regexp. They are called “anchors”. The caret ^ matches at the beginning of the text, and the dollar $ – at the end. The pattern ^Mary means: “string start and then Mary”.

What does \d mean in regex?

\d (digit) matches any single digit (same as [0-9] ). The uppercase counterpart \D (non-digit) matches any single character that is not a digit (same as [^0-9] ). \s (space) matches any single whitespace (same as [ \t\n\r\f] , blank, tab, newline, carriage-return and form-feed).

What does \b mean in regex?

The word boundary \b matches positions where one side is a word character (usually a letter, digit or underscore—but see below for variations across engines) and the other side is not a word character (for instance, it may be the beginning of the string or a space character).


1 Answers

Use Positive lookahead

> string = c("Hello-", "HelloA", "Helloa")
> grep('Hello(?=[A-Za-z])', string, perl=T)
[1] 2 3

(?=[A-Za-z]) this positive lookahead asserts that the character following the string Hello must be a letter.

OR

> grep('Hello[A-Za-z]', string)
[1] 2 3

Add a $ in the regex if there is only one letter following the string Hello. $ Asserts that we are at the end.

> grep('Hello[A-Za-z]$', string)
[1] 2 3
> grep('Hello(?=[A-Za-z]$)', string, perl=T)
[1] 2 3
like image 116
Avinash Raj Avatar answered Sep 21 '22 00:09

Avinash Raj