I want to fetch the title of a webpage which I open using urllib2. What is the best way to do this, to parse the html and find what I need (for now only the -tag but might need more in the future). Is there a good parsing lib for this purpose?

Yes I would recommend BeautifulSoup If you're getting the title it's simply: <pre class="prettyprint"><code>soup = BeautifulSoup(html) myTitle = soup.html.head.title </code></pre> or <pre class="prettyprint"><code>myTitle = soup('title') </code></pre> Taken from the documentation It's very robust and will parse the html no matter how messy it is.

Python fetching <title>

2 Answers

Yes I would recommend BeautifulSoup

If you're getting the title it's simply:

soup = BeautifulSoup(html)
myTitle = soup.html.head.title

myTitle = soup('title')

Taken from the documentation

It's very robust and will parse the html no matter how messy it is.

152

answered Nov 15 '22 04:11

RobbR

Try Beautiful Soup:

url = 'http://www.example.com'
response = urllib2.urlopen(url)
html = response.read()

soup = BeautifulSoup(html)
title = soup.html.head.title
print title.contents

answered Nov 15 '22 04:11

Dominic Rodger

Related questions
                            
                                Sorting a dict on __iter__
                            
                                How can I search through Stack Overflow questions from a script?
                            
                                Newbie Python question about strings with parameters: "%%s"?
                            
                                Python using result of function for Regular Expression Substitution
                            
                                Issues with BeautifulSoup parsing
                            
                                A simple Python IRC client library that supports SSL?
                            
                                Python base class method call: unexpected behavior
                            
                                What version of Python (2.4, 2.5, 2.6, 3.0) do you standardize on for production development efforts (and why)?
                            
                                What's going on with the lambda expression in this python function?
                            
                                How to call a data member of the base class if it is being overwritten as a property in the derived class?
                            
                                Sort a list of strings based on regular expression match
                            
                                Real-time intercepting of stdout from another process in Python
                            
                                How to override Py_GetPrefix(), Py_GetPath()?
                            
                                In python when passing arguments what does ** before an argument do? [duplicate]
                            
                                Can I use C++ features while extending Python?
                            
                                Django + Jquery, expanding AJAX div
                            
                                Is it a good idea to hash a Python class?
                            
                                Efficient way to determine whether a particular function is on the stack in Python
                            
                                Long, slow operation in Django view causes timeout. Any way for Python to speak AJAX instead?
                            
                                Python Class with integer emulation

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python fetching <title>

Tags:

python

urllib2

xintron

People also ask

2 Answers

RobbR

Dominic Rodger

Recent Activity

Donate For Us