Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R Regular expression for string containing full stops

Tags:

regex

r

I have a bunch of strings, some of which end with ..t.. I am trying to find a regular expression to match these strings but dealing with the full stops is giving me a headache!

I have tried

grep('^.+(..t.)$', myStrings)

but this also matches strings such as w...gate. I think I am dealing with the full stops incorrectly. Any help at all appreciated.

Note: I am using grep within R.

like image 370
Joe Avatar asked Sep 12 '14 10:09

Joe


People also ask

How do you match a full stop in regex?

would specify that 'a' is followed optionally by a 'b'. The full stop, or period, symbol matches any single character except a NEWLINE. matches zero or more occurrences of any character. This is a powerful regexp, which should be used with caution since it can match more characters than anticipated.

How do you match periods in regex?

The period (.) represents the wildcard character. Any character (except for the newline character) will be matched by a period in a regular expression; when you literally want a period in a regular expression you need to precede it with a backslash.

What is [] in regular expression?

The [] construct in a regex is essentially shorthand for an | on all of the contents. For example [abc] matches a, b or c. Additionally the - character has special meaning inside of a [] . It provides a range construct. The regex [a-z] will match any letter a through z.


2 Answers

Since you are only checking if the end of the string ends with ..t., you can eliminate ^.+ in your pattern.

The dot . in regular expression syntax is a character of special meaning which matches any character except a newline sequence. To match a literal dot or any other character of special meaning you need to escape \\ it.

> x <- c('foo..t.', 'w...gate', 'bar..t.foo', 'bar..t.')
> grep('\\.{2}t\\.$', x)
# [1] 1 4

Or place that character inside of a character class.

> x <- c('foo..t.', 'w...gate', 'bar..t.foo', 'bar..t.')
> grep('[.]{2}t[.]$', x)
# [1] 1 4

Note: I used the range operator \\.{2} to match two dots instead of escaping it twice \\.\\.

like image 98
hwnd Avatar answered Oct 02 '22 20:10

hwnd


k, a little bit of better googling provided the answer;

 grep("^.+(\\.\\.t\\.)$", myStrings)

this works because we need to escape the point as \\. in R.

like image 37
Joe Avatar answered Oct 02 '22 21:10

Joe