I want to fetch the title of a webpage which I open using urllib2. What is the best way to do this, to parse the html and find what I need (for now only the -tag but might need more in the future).
Is there a good parsing lib for this purpose?
Python String title() Method The title() method returns a string where the first character in every word is upper case. Like a header, or a title. If the word contains a number or a symbol, the first letter after that will be converted to upper case.
The Python title() function is used to change the initial character in each word to Uppercase and the subsequent characters to Lowercase and then returns a new string. Python title() method returns a title-cased string by converting the initial letter of each word to a capital letter.
Introduction to the Python title() method To make titlecased version of a string, you use the string title() method. The title() returns a copy of a string in the titlecase. The title() method converts the first character of each words to uppercase and the remaining characters in lowercase.
Yes I would recommend BeautifulSoup
If you're getting the title it's simply:
soup = BeautifulSoup(html)
myTitle = soup.html.head.title
or
myTitle = soup('title')
Taken from the documentation
It's very robust and will parse the html no matter how messy it is.
Try Beautiful Soup:
url = 'http://www.example.com'
response = urllib2.urlopen(url)
html = response.read()
soup = BeautifulSoup(html)
title = soup.html.head.title
print title.contents
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With