What would be the simplest way to get the title of a page in Requests?
r = requests.get('http://www.imdb.com/title/tt0108778/') # ? r.title Friends (TV Series 1994–2004) - IMDb
If you have trouble finding the “<title>” in the sea of HTML, then use the Find function. Again, on Windows, you can select Ctrl + F and then type “title” to quickly find the Title. That's all there is to it. Now you can easily find the webpage Title for any page on your website.
The requests library is the de facto standard for making HTTP requests in Python. It abstracts the complexities of making requests behind a beautiful, simple API so that you can focus on interacting with services and consuming data in your application.
In Python3, we can call method urlopen from urllib.request and BeautifulSoup from bs4 library to fetch the page title. Here we are using the most efficient parser 'lxml'.
The regexp should be '<\W*title\W* (.*)</title'. Without the dot, it gives re.error: nothing to repeat at position 13. Pythonic HTML Parsing for Humans.
$content = "Hello World!"; This example will return the $page object for the page titled “About”. Then the $page->ID element is used to exclude the About page when listing pages. You must log in before being able to contribute a note or feedback.
You need an HTML parser to parse the HTML response and get the title
tag's text:
Example using lxml.html
:
>>> import requests >>> from lxml.html import fromstring >>> r = requests.get('http://www.imdb.com/title/tt0108778/') >>> tree = fromstring(r.content) >>> tree.findtext('.//title') u'Friends (TV Series 1994\u20132004) - IMDb'
There are certainly other options, like, for example, mechanize
library:
>>> import mechanize >>> br = mechanize.Browser() >>> br.open('http://www.imdb.com/title/tt0108778/') >>> br.title() 'Friends (TV Series 1994\xe2\x80\x932004) - IMDb'
What option to choose depends on what are you going to do next: parse the page to get more data, or, may be, you want to interact with it: click buttons, submit forms, follow links etc.
Besides, you may want to use an API provided by IMDB
, instead of going down to HTML parsing, see:
Example usage of an IMDbPY
package:
>>> from imdb import IMDb >>> ia = IMDb() >>> movie = ia.get_movie('0108778') >>> movie['title'] u'Friends' >>> movie['series years'] u'1994-2004'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With