BeautifulSoup in Python - getting the n-th tag of a type

Tags:

beautifulsoup

I have some html code that contains many <table>s in it.

I'm trying to get the information in the second table. Is there a way to do this without using soup.findAll('table') ?

When I do use soup.findAll('table'), I get an error:

ValueError: too many values to unpack

Is there a way to get the n-th tag in some code or another way that does not require going through all the tables? Or should I see if I can add titles to the tables? (like <table title="things">)

There are also headers (<h4>title</h4>) above each table, if that helps.

Thanks.

EDIT

Here's what I was thinking when I asked the question:

I was unpacking the objects into two values, when there were many more. I thought this would just give me the first two things from the list, but of course, it kept giving me the error mentioned above. I was unaware the return value was a list and thought it was a special object or something and I was basing my code off of my friends'.

I was thinking this error meant there were too many tables on the page and that it couldn't handle all of them, so I was asking for a way to do it without the method I was using. I probably should have stopped assuming things.

Now I know it returns a list and I can use this in a for loop or get a value from it with soup.findAll('table')[someNumber]. I learned what unpacking was and how to use it, as well. Thanks everyone who helped.

Hopefully that clears things up, now that I know what I'm doing my question makes less sense than it did when I asked it, so I thought I'd just put a note here on what I was thinking.

EDIT 2:

This question is now pretty old, but I still see that I was never really clear about what I was doing.

If it helps anyone, I was attempting to unpack the findAll(...) results, of which the amount of them I didn't know.

useless_table, table_i_want, another_useless_table = soup.findAll("table");

Since there weren't always the amount of tables I had guessed in the page, and all the values in the tuple need to be unpacked, I was receiving the ValueError:

ValueError: too many values to unpack

So, I was looking for the way to grab the second (or whichever index) table in the tuple returned without running into errors about how many tables were used.

285

asked Dec 30 '12 22:12

nasonfish

2 Answers

To get the second table from the call soup.findAll('table'), use it as a list, just index it:

secondtable = soup.findAll('table')[1]

102

answered Sep 21 '22 05:09

Martijn Pieters

Martjin Pieter's answer will make it work indeed. I had some experience with nested table tag which broke my code when I just simply get the second table in the list without paying attention.

When you try to find_all and get the nth element, there is a potential you will mess up, you had better locate the first element you want and make sure the n-th element is actually a sibling of that element instead of children.

You can use the find_next_sibling() to secure your code
you can find the parent first and then use find_all(recursive=False) to guarantee your search range.

Just in case you need it. I will list my code below(use recursive=FALSE).

import urllib2
from bs4 import BeautifulSoup

text = """
<html>
    <head>
    </head>
    <body>
        <table>
            <p>Table1</p>
            <table>
                <p>Extra Table</p>
            </table>
        </table>
        <table>
            <p>Table2</p>
        </table>
    </body>
</html>
"""

soup = BeautifulSoup(text)

tables = soup.find('body').find_all('table')
print len(tables)
print tables[1].text.strip()
#3
#Extra Table # which is not the table you want without warning

tables = soup.find('body').find_all('table', recursive=False)
print len(tables)
print tables[1].text.strip()
#2
#Table2 # your desired output

answered Sep 20 '22 05:09

B.Mr.W.

Related questions
                            
                                Getting AttributeError: module 'pandas' has no attribute 'json_normalize' while calling method "Access OutbreakLocation data"
                            
                                Python plotting: How can I make matplotlib.pyplot stop forcing the style of my markers?
                            
                                How do I use easy_install and buildout when pypi is down?
                            
                                Reversing Django URLs With Extra Options
                            
                                Parse an HTTP request Authorization header with Python
                            
                                SQLAlchemy Many-to-Many Relationship on a Single Table
                            
                                Programmatic Python Browser with JavaScript
                            
                                How can I add a Picture to a QWidget in PyQt4
                            
                                Python, store a dict in a database
                            
                                In Python, how do you use decimal module in a script rather than the interpreter?
                            
                                python logger logging same entry numerous times
                            
                                What is a good place to store configuration in Google AppEngine (python)
                            
                                Checking if an ISBN number is correct
                            
                                Sending Meeting Invitations With Python
                            
                                testing for empty/null string in django
                            
                                How to change the dtype of certain columns of a numpy recarray?
                            
                                What is the advantage of using the native C++ Qt over PyQt [closed]
                            
                                Build query string using urlencode python
                            
                                SQL Alchemy ResultProxy.rowcount should not be zero
                            
                                Nicing a running python process?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With