Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I retrieve the page title of a webpage using Python?

Tags:

python

html

How can I retrieve the page title of a webpage (title html tag) using Python?

like image 845
cschol Avatar asked Sep 09 '08 04:09

cschol


People also ask

How do you find the title of a web page?

On web browsers, the website title appears at the top of the tab or window, and in search results website titles display as bold hyperlinked texts. A good rule of thumb is to make website titles 50 to 65 characters long and ensure they are clear, as well as descriptive without being truncated.


4 Answers

Here's a simplified version of @Vinko Vrsalovic's answer:

import urllib2
from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen("https://www.google.com"))
print soup.title.string

NOTE:

  • soup.title finds the first title element anywhere in the html document

  • title.string assumes it has only one child node, and that child node is a string

For beautifulsoup 4.x, use different import:

from bs4 import BeautifulSoup
like image 114
jfs Avatar answered Oct 23 '22 17:10

jfs


I'll always use lxml for such tasks. You could use beautifulsoup as well.

import lxml.html
t = lxml.html.parse(url)
print(t.find(".//title").text)

EDIT based on comment:

from urllib2 import urlopen
from lxml.html import parse

url = "https://www.google.com"
page = urlopen(url)
p = parse(page)
print(p.find(".//title").text)
like image 45
Peter Hoffmann Avatar answered Oct 23 '22 16:10

Peter Hoffmann


No need to import other libraries. Request has this functionality in-built.

>> hearders = {'headers':'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:51.0) Gecko/20100101 Firefox/51.0'}
>>> n = requests.get('http://www.imdb.com/title/tt0108778/', headers=hearders)
>>> al = n.text
>>> al[al.find('<title>') + 7 : al.find('</title>')]
u'Friends (TV Series 1994\u20132004) - IMDb' 
like image 24
Rahul Chawla Avatar answered Oct 23 '22 16:10

Rahul Chawla


The mechanize Browser object has a title() method. So the code from this post can be rewritten as:

from mechanize import Browser
br = Browser()
br.open("http://www.google.com/")
print br.title()
like image 15
codeape Avatar answered Oct 23 '22 16:10

codeape