Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible for BeautifulSoup to work in a case-insensitive manner?

Tags:

I am trying to extract Meta Description for fetched webpages. But here I am facing the problem of case sensitivity of BeautifulSoup.

As some of the pages have <meta name="Description and some have <meta name="description.

My problem is very much similar to that of Question on Stackoverflow

The only difference is that I can't use lxml .. I have to stick with Beautifulsoup.

like image 432
Nitin Avatar asked Apr 08 '10 18:04

Nitin


People also ask

What is the main role of BeautifulSoup?

One of them is Beautiful Soup, which is a python library for pulling data out of HTML and XML files. It creates data parse trees in order to get data easily.

Is navigable string editable in BeautifulSoup?

The navigablestring object is used to represent the contents of a tag. To access the contents, use “. string” with tag. You can replace the string with another string but you can't edit the existing string.


1 Answers

You can give BeautifulSoup a regular expression to match attributes against. Something like

soup.findAll('meta', name=re.compile("^description$", re.I)) 

might do the trick. Cribbed from the BeautifulSoup docs.

like image 184
Will McCutchen Avatar answered Dec 24 '22 20:12

Will McCutchen