I am trying to get language code from pages by curl
I wrote below and work...
curl -Ls yahoo.com | grep "lang=" | head -1 | cut -d ' ' -f 3 | cut -d"\"" -f 2
but sometimes code is different like
curl -Ls stick-it.app | grep "lang=" | head -1 | cut -d ' ' -f 3 | cut -d"\"" -f 2
they wrote like
<html dir="rtl" lang="he-IL">
I just need to get he-IL
If is there any other way, I would appreciate it...
Using any sed in any shell on every Unix box:
$ curl -Ls yahoo.com | sed -n 's/^<html.* lang="\([^"]*\).*/\1/p'
en-US
If you have gnu-grep then using -P (perl regex):
curl -Ls yahoo.com | grep -oP '\slang="\K[^"]+'
he-IL
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With