Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R regular expressions, trying to capture a group

Tags:

regex

r

I've read a few of the other questions on R capture groups in regular expressions and i'm not having much luck.

I have a string:

127.0.0.1 - - [07/Dec/2014:06:43:43 -0800] \"OPTIONS * HTTP/1.0\" 200 - \"-\" \"Apache/2.2.14 (Ubuntu) PHP/5.3.2-1ubuntu4.24 with Suhosin-Patch mod_ssl/2.2.14 OpenSSL/0.9.8k mod_apreq2-20090110/2.7.1 mod_perl/2.0.4 Perl/v5.10.1 (internal dummy connection)\"

From which I am trying to capture a timestamp:

07/Dec/2014:06:43:43 -0800

The following function invocation returns a match:

regmatches(x,regexpr('\\[([\\w:/]+\\s[+\\-]\\d{4})\\]',x,perl=TRUE))
[1] "[07/Dec/2014:06:43:43 -0800]"

I've tried to capture the single group itself with str_match with varying varieties of this regex:

str_match(x, "\\[([\\w:/]+\\s[+\\-]\\d{4})\\]")
     [,1] [,2]
[1,] NA   NA

To no avail. Varying varieties of this regex test correctly in most of the online regex testers so I don't think the regex is the problem.

How can I get just the timestamp itself so I can pump it into strptime, without doing something like gsub the brackets? gsub doesn't work to get the group for me, str_match doesn't work, what am I missing? The ideal output would be

07/Dec/2014:06:43:43 -0800

which I could then use in strptime.

like image 343
Arima Avatar asked Dec 28 '25 19:12

Arima


2 Answers

Use \k (\K keeps the text matched so far out of the overall regex match.) and a positive lookahead.

> regmatches(x,regexpr('\\[\\K[\\w:/]+\\s[+\\-]\\d{4}(?=\\])',x,perl=TRUE))
[1] "07/Dec/2014:06:43:43 -0800"

\\K in \\[\\K discards the previously matched [ character.

like image 159
Avinash Raj Avatar answered Dec 30 '25 08:12

Avinash Raj


(?<=\[)([\w:\/]+\s[+\-]\d{4})(?=\])

Try this.See demo.

https://regex101.com/r/tX2bH4/16

like image 45
vks Avatar answered Dec 30 '25 09:12

vks