Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP/regex to parse NGINX error log

Tags:

regex

php

The error entry looks like:

2011/06/10 13:30:10 [error] 23263#0: *1 directory index of "/var/www/ssl/" is forbidden, client: 86.186.86.232, server: hotelpublisher.com, request: "GET / HTTP/1.1", host: "hotelpublisher.com"

I need to parse:

date/time
error type
error message
client
server
request
host

The first bit (parsing date) is easy using substr. Though my REGEX is not too good and I am hoping to hear a better solution. Simply exploding by , won't work as well, I guess, since error can potentially contain a comma as well.

What is the most efficient way to do this?

like image 440
Gajus Avatar asked Feb 24 '23 22:02

Gajus


2 Answers

What about:

$str = '2011/06/10 13:30:10 [error] 23263#0: *1 directory index of "/var/www/ssl/" is forbidden, client: 86.186.86.232, server: hotelpublisher.com, request: "GET / HTTP/1.1", host: "hotelpublisher.com"';
preg_match('~^(?P<datetime>[\d+/ :]+) \[(?P<errortype>.+)\] .*?: (?P<errormessage>.+), client: (?P<client>.+), server: (?P<server>.+), request: (?P<request>.+), host: (?P<host>.+)$~', $str, $matches);
print_r($matches);

output:

Array
(
    [0] => 2011/06/10 13:30:10 [error] 23263#0: *1 directory index of "/var/www/ssl/" is forbidden, client: 86.186.86.232, server: hotelpublisher.com, request: "GET / HTTP/1.1", host: "hotelpublisher.com"
    [datetime] => 2011/06/10 13:30:10
    [1] => 2011/06/10 13:30:10
    [errortype] => error
    [2] => error
    [errormessage] => *1 directory index of "/var/www/ssl/" is forbidden
    [3] => *1 directory index of "/var/www/ssl/" is forbidden
    [client] => 86.186.86.232
    [4] => 86.186.86.232
    [server] => hotelpublisher.com
    [5] => hotelpublisher.com
    [request] => "GET / HTTP/1.1"
    [6] => "GET / HTTP/1.1"
    [host] => "hotelpublisher.com"
    [7] => "hotelpublisher.com"
)
like image 197
Toto Avatar answered Mar 01 '23 22:03

Toto


This is how I did it.

$error      = array();

$error['date']          = strtotime(substr($line, 0, 19));

$line                   = substr($line, 20);
$error_str              = explode(': ', strstr($line, ', client:', TRUE), 2);

$error['message']       = $error_str[1];

preg_match("|\[([a-z]+)\] (\d+)#(\d+)|", $error_str[0], $matches);

$error['error_type']    = $matches[1];


$args_str   = explode(', ', substr(strstr($line, ', client:'), 2));
$args       = array();

foreach($args_str as $a)
{
    $name_value = explode(': ', $a, 2);

    $args[$name_value[0]]   = trim($name_value[1], '"');
}

$error  = array_merge($error, $args);

die(var_dump( $error ));

Which will produce:

array(7) {
  ["date"]=>
  int(1307709010)
  ["message"]=>
  string(50) "*1 directory index of "/var/www/ssl/" is forbidden"
  ["error_type"]=>
  string(5) "error"
  ["client"]=>
  string(13) "86.186.86.232"
  ["server"]=>
  string(18) "hotelpublisher.com"
  ["request"]=>
  string(14) "GET / HTTP/1.1"
  ["host"]=>
  string(18) "hotelpublisher.com"
}

Just want to see few votes to know which is the preferred option regarding performance/reliability.

like image 28
Gajus Avatar answered Mar 01 '23 23:03

Gajus