Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

grep multiline pattern

I have a list of URLs in a file (each line = different Domain) I want to scan (not recursively) and pick two patterns, which are in different lines. After two days of trying - my head is spinning …

That is the important HTML-Part:

<a href="http://subdomain.domain.tld/">Home</a>
</li>
<li>
<a data-uv-trigger='true' href='mailto:[email protected]'>

I need to pick the domain (subdomain.domain.tld) and the email-adress ([email protected]). I can (wget / sed) the parts in two steps.

wget -O - -i urls-to-scan-manuell.txt | sed -n "s/\(.*a href=\"\)\(.*\)\(\">Home.*\)/\2/p"

wget -O - -i urls-to-scan-manuell.txt | sed -n "s/\(.*true' href='mailto\)\(.*\)\('>.*\)/\2/p"

But I would like to pick both parts at once and write them out to a file in one line, separated by a blank (space). It is the multiline thing with sed that drives me nuts.

Please: I need your help, would you :)

Thank you in advance, Rainer.

like image 591
rko Avatar asked Dec 13 '25 07:12

rko


1 Answers

For the record: it's not recommended to parse HTML using regex.


You can use sed with multiple expressions with -e that could help here:

wget -O - -i urls-to-scan-manuell.txt | sed -n \
  -e "s/\(.*a href=\"\)\(.*\)\(\">Home.*\)/\2/p" \
  -e "s/\(.*true' href='mailto\)\(.*\)\('>.*\)/\2/p"

This will produce two lines, one for the domain and one for the email. If you prefer the output on one line, you can pipe to paste - -. The default delimiter is TAB, you can change that with the -d flag, so for example:

wget -O - -i urls-to-scan-manuell.txt | sed -n \
  -e "s/\(.*a href=\"\)\(.*\)\(\">Home.*\)/\2/p" \
  -e "s/\(.*true' href='mailto:\)\(.*\)\('>.*\)/\2/p" | \
paste -d, - -

Will produce:

http://subdomain.domain.tld/,[email protected]

I took the liberty and added a : after mailto in the pattern, because I guess that was your intention.

like image 106
janos Avatar answered Dec 16 '25 08:12

janos



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!