I'm using PHP's preg_match_all() to search a string imported using file_get_contents(). The regex returns matches but I would like to know at which line number those matches are found. What's the best technique to achieve this?
I could read the file as an array and perform the regex for each line, but the problem is that my regex matches results across carriage returns (new lines).
well it's kinda late, maybe you alrady solved this, but i had to do it and it's fairly simple.
using PREG_OFFSET_CAPTURE
flag in preg_match
will return the character position of the match.
lets assume $charpos, so
list($before) = str_split($content, $charpos); // fetches all the text before the match
$line_number = strlen($before) - strlen(str_replace("\n", "", $before)) + 1;
voilá!
You can't do this with only regexs. At least not cleanly. What can you do it to use the PREG_OFFSET_CAPTURE
flag of the preg_match_all and do a post parsing of the entire file.
I mean after you have the array of matches strings and starting offsets for each string just count how many \r\n
or \n
or \r
are between the beginning of the file and the offset for each match. The line number of the match would be the number of distinct EOL terminators (\r\n
| \n
| \r
) plus 1
.
$data = "Abba
Beegees
Beatles";
preg_match_all('/Abba|Beegees|Beatles/', $data, $matches, PREG_OFFSET_CAPTURE);
foreach (current($matches) as $match) {
$matchValue = $match[0];
$lineNumber = substr_count(mb_substr($data, 0, $match[1]), PHP_EOL) + 1;
echo "`{$matchValue}` at line {$lineNumber}\n";
}
Output
`Abba` at line 1
`Beegees` at line 2
`Beatles` at line 3
(check your performance requirements)
Using preg_match_all
with the PREG_OFFSET_CAPTURE flag is necessary to solve this problem, the code comments should explain what kind of array preg_match_all
returns and how the line numbers can be calculated:
// Given string to do a match with
$string = "\n\nabc\nwhatever\n\ndef";
// Match "abc" and "def" in a string
if(preg_match_all("#(abc).*(def)#si", $string, $matches, PREG_OFFSET_CAPTURE)) {
// Now $matches[0][0][0] contains the complete matching string
// $matches[1][0][0] contains the results for the first substring (abc)
// $matches[2][0][0] contains the results for the second substring (def)
// $matches[0][0][1] contains the string position of the complete matching string
// $matches[1][0][1] contains the string position of the first substring (abc)
// $matches[2][0][1] contains the string position of the second substring (def)
// First (abc) match line number
// Cut off the original string at the matching position, then count
// number of line breaks (\n) for that subset of a string
$line = substr_count(substr($string, 0, $matches[1][0][1]), "\n") + 1;
echo $line . "\n";
// Second (def) match line number
// Cut off the original string at the matching position, then count
// number of line breaks (\n) for that subset of a string
$line = substr_count(substr($string, 0, $matches[2][0][1]), "\n") + 1;
echo $line . "\n";
}
This will return 3
for the first substring and 6
for the second substring. You can change \n
to \r\n
or \r
if you use different newlines.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With