Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get page title in requests

Tags:

What would be the simplest way to get the title of a page in Requests?

r = requests.get('http://www.imdb.com/title/tt0108778/') # ? r.title Friends (TV Series 1994–2004) - IMDb 
like image 793
David542 Avatar asked Nov 08 '14 00:11

David542


People also ask

How do you find the title of a Web page?

If you have trouble finding the “<title>” in the sea of HTML, then use the Find function. Again, on Windows, you can select Ctrl + F and then type “title” to quickly find the Title. That's all there is to it. Now you can easily find the webpage Title for any page on your website.

What is the request library?

The requests library is the de facto standard for making HTTP requests in Python. It abstracts the complexities of making requests behind a beautiful, simple API so that you can focus on interacting with services and consuming data in your application.

How to get page title from urlopen in Python?

In Python3, we can call method urlopen from urllib.request and BeautifulSoup from bs4 library to fetch the page title. Here we are using the most efficient parser 'lxml'.

What is the correct regexp for the title of a string?

The regexp should be '<\W*title\W* (.*)</title'. Without the dot, it gives re.error: nothing to repeat at position 13. Pythonic HTML Parsing for Humans.

How do I exclude the about page when listing pages?

$content = "Hello World!"; This example will return the $page object for the page titled “About”. Then the $page->ID element is used to exclude the About page when listing pages. You must log in before being able to contribute a note or feedback.


1 Answers

You need an HTML parser to parse the HTML response and get the title tag's text:

Example using lxml.html:

>>> import requests >>> from lxml.html import fromstring >>> r = requests.get('http://www.imdb.com/title/tt0108778/') >>> tree = fromstring(r.content) >>> tree.findtext('.//title') u'Friends (TV Series 1994\u20132004) - IMDb' 

There are certainly other options, like, for example, mechanize library:

>>> import mechanize >>> br = mechanize.Browser() >>> br.open('http://www.imdb.com/title/tt0108778/') >>> br.title() 'Friends (TV Series 1994\xe2\x80\x932004) - IMDb' 

What option to choose depends on what are you going to do next: parse the page to get more data, or, may be, you want to interact with it: click buttons, submit forms, follow links etc.

Besides, you may want to use an API provided by IMDB, instead of going down to HTML parsing, see:

  • Does IMDB provide an API?
  • IMDbPY

Example usage of an IMDbPY package:

>>> from imdb import IMDb >>> ia = IMDb() >>> movie = ia.get_movie('0108778') >>> movie['title'] u'Friends' >>> movie['series years'] u'1994-2004' 
like image 176
alecxe Avatar answered Oct 07 '22 06:10

alecxe