Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I extract meta tags from HTML in a bash/awk script?

I have a working Bash script to extract title tags. I need help with an AWK field separator for extracting meta tags from HTML, like these:

<meta name="keywords" content="key1, key2, key3">

my script works to extract title, but meta name doesn't work.

#!/bin/bash
for LINE in `cat htmls.txt`

do
   echo $LINE
   awk 'BEGIN{IGNORECASE=1;FS="<title>|</title>";RS=EOF} {print $2}' $LINE |
   awk '{ if (NF > 0) printf("%s\n", $0); }'
done

I guess I need a regex solution. Any ideas?

like image 825
chuckfinley Avatar asked Nov 22 '25 14:11

chuckfinley


1 Answers

first install xml2 e.g.

sudo apt-get install xml2

wget -q -O - http://www.latin.fm | xml2 | grep meta | awk -F/ '{print $NF}'


Output

@property=og:title
@content=Latin FM
...
like image 69
Eric Fortis Avatar answered Nov 25 '25 11:11

Eric Fortis



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!