Using regex in grep how would I capture the value of key and id (excluding quotes) in:
<a href="/stats.php?key=string" id="92340" class=""
Best I got with look behind & look ahead is (?<=<a href="\/stats\.php\?key=).*(?=" class="") but that results in string" id="92340
Ideally the key pair would look like string 92340
Any help is much appreciated.
With your shown samples and attempts please try following GNU grep code. Where I am using -oP option to display only matched things and -P is for enabling PCRE regex.
grep -oP '^<a href="/stats\.php\?key=[^"]*" id="\K\d+' Input_file
Explanation: Adding detailed explanation for used regex.
^ ##Match from starting of the value from here.
<a href="/stats\.php ##Matching <a href="/stats\.php here where DOT is escaped to make it literal here.
\?key= ##Matching literal ? followed by key= here.
[^"]* ##Matching everything before next occurrence of " including "
" id=" ##Match " id=" here as per text.
\K ##\K is GNU grep option to forget text what you have mactched till now, match it but don't print it.
\d+ ##Match 1 OR more occurrences of digits here.
To get word string and values also try perl better since it has capturing group concept in it.
perl -pe 's|^<a href="/stats\.php\?key=([^"]*)" id="(\d+).*$|$1 $2|g' Input_file
If you want 2 separate matches with grep, you can use the \G anchor and make use of \K to forget what is matched so far:
grep -oP '(?:<a href="/stats\.php\?key|\G(?!^))[^=<>]*="?\K[^<>"]+' file
Or matching key and id (see the matches):
grep -oP '(?:<a href="/stats\.php\?|\G(?!^))(?:key=|"\h+id=")\K[^"]+' file
Output
string
92340
Or using gnu awk with 2 capture groups:
awk 'match($0, /<a href="\/stats\.php\?key=([^"]*)" id="([^"]*)"/, a) {print a[1], a[2]}' file
Output
string 92340
But if you are free to choose a tool, you could use a dedicated HTML / XML parser as regexes are not aware of any coding structure.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With