Is there a clean way to get the n-th column of an html table using BeautifulSoup?

Tags:

Say we look at the first table in a page, so:

table = BeautifulSoup(...).table

the rows can be scanned with a clean for-loop:

for row in table:
    f(row)

But for getting a single column things get messy.

My question: is there an elegant way to extract a single column, either by its position, or by its 'name' (i.e. text that appears in the first row of this column)?

288

asked Apr 03 '11 20:04

Benjamin Nitlehoo

1 Answers

lxml is many times faster than BeautifulSoup, so you might want to use that.

from lxml.html import parse
doc = parse('http://python.org').getroot()
for row in doc.cssselect('table > tr'):
    for cell in row.cssselect('td:nth-child(3)'):
         print cell.text_content()

Or, instead of looping:

rows = [ row for row in doc.cssselect('table > tr') ]
cells = [ cell.text_content() for cell in rows.cssselect('td:nth-child(3)') ]
print cells

160

answered Oct 03 '22 07:10

Christopher O'Donnell

Related questions
                            
                                python: how to tell socket.gethostbyaddr() which dns server to use
                            
                                Find current time interval in python?
                            
                                Automatically add constants for each of the choices in a Django model
                            
                                Jython and the SAX Parser: No more than 64000 entities allowed?
                            
                                What is the best way to build a database from a MS Word document?
                            
                                Recursive variable definitions in Python and F# (probably OCaml, too)
                            
                                debugging: how to check what where my Python program is hanging?
                            
                                Google app engine datastore tag cloud with python
                            
                                Python Kombu consumer not notified of rabbitmq message (queue.get does work)
                            
                                Should sockets be non-blocking to work with select in Python?
                            
                                Python-Gmail Email Retreving/downloading [duplicate]
                            
                                Apache, mod_*, PHP, Perl, Python, Ruby; Learning backwards
                            
                                Identify the number of elements in a python struct/pack fmt string?
                            
                                Load a numpy array into C from a file that was saved with numpy.save(...)
                            
                                Where do I place the .egg (Python)?
                            
                                How to differentiate timeout error and other `URLError`s in Python?
                            
                                Reading and graphing data read from huge files
                            
                                OAuth2 client that works on App Engine
                            
                                Finding an element of a list when the list is in a dictionary?
                            
                                How to build a lift chart (a.k.a gains chart) in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is there a clean way to get the n-th column of an html table using BeautifulSoup?

Tags:

python

html-table

beautifulsoup

Benjamin Nitlehoo

People also ask

1 Answers

Christopher O'Donnell

Recent Activity

Donate For Us