I am trying to extract Meta Description for fetched webpages. But here I am facing the problem of case sensitivity of BeautifulSoup.
As some of the pages have <meta name="Description
and some have <meta name="description
.
My problem is very much similar to that of Question on Stackoverflow
The only difference is that I can't use lxml .. I have to stick with Beautifulsoup.
One of them is Beautiful Soup, which is a python library for pulling data out of HTML and XML files. It creates data parse trees in order to get data easily.
The navigablestring object is used to represent the contents of a tag. To access the contents, use “. string” with tag. You can replace the string with another string but you can't edit the existing string.
You can give BeautifulSoup a regular expression to match attributes against. Something like
soup.findAll('meta', name=re.compile("^description$", re.I))
might do the trick. Cribbed from the BeautifulSoup docs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With