Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get line number from preg_match_all()

Tags:

regex

php

I'm using PHP's preg_match_all() to search a string imported using file_get_contents(). The regex returns matches but I would like to know at which line number those matches are found. What's the best technique to achieve this?

I could read the file as an array and perform the regex for each line, but the problem is that my regex matches results across carriage returns (new lines).

like image 393
bart Avatar asked Jan 19 '11 01:01

bart


4 Answers

well it's kinda late, maybe you alrady solved this, but i had to do it and it's fairly simple. using PREG_OFFSET_CAPTURE flag in preg_match will return the character position of the match. lets assume $charpos, so

list($before) = str_split($content, $charpos); // fetches all the text before the match

$line_number = strlen($before) - strlen(str_replace("\n", "", $before)) + 1;

voilá!

like image 101
Javier Avatar answered Nov 12 '22 08:11

Javier


You can't do this with only regexs. At least not cleanly. What can you do it to use the PREG_OFFSET_CAPTURE flag of the preg_match_all and do a post parsing of the entire file.

I mean after you have the array of matches strings and starting offsets for each string just count how many \r\n or \n or \r are between the beginning of the file and the offset for each match. The line number of the match would be the number of distinct EOL terminators (\r\n | \n | \r) plus 1.

like image 32
Mihai Toader Avatar answered Nov 12 '22 08:11

Mihai Toader


$data = "Abba
Beegees
Beatles";

preg_match_all('/Abba|Beegees|Beatles/', $data, $matches, PREG_OFFSET_CAPTURE);
foreach (current($matches) as $match) {
    $matchValue = $match[0];
    $lineNumber = substr_count(mb_substr($data, 0, $match[1]), PHP_EOL) + 1;

    echo "`{$matchValue}` at line {$lineNumber}\n";
}

Output

`Abba` at line 1
`Beegees` at line 2
`Beatles` at line 3

(check your performance requirements)

like image 2
B Brendler Avatar answered Nov 12 '22 06:11

B Brendler


Using preg_match_all with the PREG_OFFSET_CAPTURE flag is necessary to solve this problem, the code comments should explain what kind of array preg_match_all returns and how the line numbers can be calculated:

// Given string to do a match with
$string = "\n\nabc\nwhatever\n\ndef";

// Match "abc" and "def" in a string
if(preg_match_all("#(abc).*(def)#si", $string, $matches, PREG_OFFSET_CAPTURE)) {
  // Now $matches[0][0][0] contains the complete matching string
  // $matches[1][0][0] contains the results for the first substring (abc)
  // $matches[2][0][0] contains the results for the second substring (def)
  // $matches[0][0][1] contains the string position of the complete matching string
  // $matches[1][0][1] contains the string position of the first substring (abc)
  // $matches[2][0][1] contains the string position of the second substring (def)

  // First (abc) match line number
  // Cut off the original string at the matching position, then count
  // number of line breaks (\n) for that subset of a string
  $line = substr_count(substr($string, 0, $matches[1][0][1]), "\n") + 1;
  echo $line . "\n";

  // Second (def) match line number
  // Cut off the original string at the matching position, then count
  // number of line breaks (\n) for that subset of a string
  $line = substr_count(substr($string, 0, $matches[2][0][1]), "\n") + 1;
  echo $line . "\n";
}

This will return 3 for the first substring and 6 for the second substring. You can change \n to \r\n or \r if you use different newlines.

like image 2
iquito Avatar answered Nov 12 '22 06:11

iquito