I'm alright with basic regular expressions, but I get a bit lost around pos/neg look aheads/behinds.
I'm trying to pull the id # from this:
[keyword stuff=otherstuff id=123 morestuff=stuff]
There could be unlimited amounts of "stuff" before or after. I've been using The Regex Coach to help debug what I've tried, but I'm not moving forward anymore...
So far I have this:
\[keyword (?:id=([0-9]+))?[^\]]*\]
Which takes care of any extra attributes after the id, but I can't figure out how to ignore everything between keyword and id.
I know I can't go [^id]*
I believe I need to use a negative lookahead like this (?!id)*
but I guess since it's zero-width, it doesn't move forward from there.
This doesn't work either:
\[keyword[A-z0-9 =]*(?!id)(?:id=([0-9]+))?[^\]]*\]
I've been looking all over for examples, but haven't found any. Or perhaps I have, but they went so far over my head I didn't even realize what they were.
Help! Thanks.
EDIT: It has to match [keyword stuff=otherstuff] as well, where id= doesn't exist at all, so I have to have a 1 or 0 on the id # group. There are also other [otherkeywords id=32] which I do not want to match. The document needs to match multiple [keyword id=3] throughout the documents using preg_match_all.
No lookahead/behind required:
/\[keyword(?:[^\]]*?\bid=([0-9]+))?[^\]]*?\]/
Added the ending '[^]]*]' to check for a real tag end, could be unnecessary.
Edit: added the \b to id as otherwise it could match [keyword you-dont-want-this-guid=123123-132123-123 id=123]
$ php -r 'preg_match_all("/\[keyword(?:[^\]]*?\bid=([0-9]+))?[^\]]*?\]/","[keyword stuff=otherstuff morestuff=stuff]",$matches);var_dump($matches);'
array(2) {
[0]=>
array(1) {
[0]=>
string(42) "[keyword stuff=otherstuff morestuff=stuff]"
}
[1]=>
array(1) {
[0]=>
string(0) ""
}
}
$ php -r 'var_dump(preg_match_all("/\[keyword(?:[^\]]*?\bid=([0-9]+))?[^\]]*?\]/","[keyword stuff=otherstuff id=123 morestuff=stuff]",$matches),$matches);'
int(1)
array(2) {
[0]=>
array(1) {
[0]=>
string(49) "[keyword stuff=otherstuff id=123 morestuff=stuff]"
}
[1]=>
array(1) {
[0]=>
string(3) "123"
}
}
You do not need look ahead / behind.
Since the question is tagged PHP, use preg_match_all() and store the match in $matches.
Here's how:
<?php
// Store the string. I single quote, in case there are backslashes I
// didn't see.
$string = 'blah blah[keyword stuff=otherstuff id=123 morestuff=stuff]
blah blah[otherkeyword stuff=otherstuff id=555 morestuff=stuff]
blah blah[keyword stuff=otherstuff id=444 morestuff=stuff]';
// The pattern is '[keyword' followed by not ']' a space and id
// The space before id is important, so you don't catch 'guid', etc.
// If '[keyword' is always at the beginning of a line, you can use
// '^\[keyword'
$pattern = '/\[keyword[^\]]* id=([0-9]+)/';
// Find every single $pattern in $string and store it in $matches
preg_match_all($pattern, $string, $matches);
// The only tricky part you have to know is that each entire match is stored in
// $matches[0][x], and the part of the match in the parentheses, which is what
// you want is stored in $matches[1][x]. The brackets are optional, since it's
// only one line.
foreach($matches[1] as $value)
{
echo $value . "<br/>";
}
?>
Output:
123
444
( 555 is skipped, as it should be)
PS
You can also use \b
instead of a literal space if there could be a tab instead. \b
represents a word boundary... in this case the beginning of a word.
$pattern = '/\[keyword[^\]]*\bid=([0-9]+)/';
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With