How can I retrieve the page title of a webpage (title html tag) using Python?
On web browsers, the website title appears at the top of the tab or window, and in search results website titles display as bold hyperlinked texts. A good rule of thumb is to make website titles 50 to 65 characters long and ensure they are clear, as well as descriptive without being truncated.
Here's a simplified version of @Vinko Vrsalovic's answer:
import urllib2
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(urllib2.urlopen("https://www.google.com"))
print soup.title.string
NOTE:
soup.title finds the first title element anywhere in the html document
title.string assumes it has only one child node, and that child node is a string
For beautifulsoup 4.x, use different import:
from bs4 import BeautifulSoup
I'll always use lxml for such tasks. You could use beautifulsoup as well.
import lxml.html
t = lxml.html.parse(url)
print(t.find(".//title").text)
EDIT based on comment:
from urllib2 import urlopen
from lxml.html import parse
url = "https://www.google.com"
page = urlopen(url)
p = parse(page)
print(p.find(".//title").text)
No need to import other libraries. Request has this functionality in-built.
>> hearders = {'headers':'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:51.0) Gecko/20100101 Firefox/51.0'}
>>> n = requests.get('http://www.imdb.com/title/tt0108778/', headers=hearders)
>>> al = n.text
>>> al[al.find('<title>') + 7 : al.find('</title>')]
u'Friends (TV Series 1994\u20132004) - IMDb'
The mechanize Browser object has a title() method. So the code from this post can be rewritten as:
from mechanize import Browser
br = Browser()
br.open("http://www.google.com/")
print br.title()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With