I have some html code that contains many <table>
s in it.
I'm trying to get the information in the second table. Is there a way to do this without using soup.findAll('table')
?
When I do use soup.findAll('table')
, I get an error:
ValueError: too many values to unpack
Is there a way to get the n-th tag in some code or another way that does not require going through all the tables? Or should I see if I can add titles to the tables? (like <table title="things">
)
There are also headers (<h4>title</h4>
) above each table, if that helps.
Thanks.
EDIT
Here's what I was thinking when I asked the question:
I was unpacking the objects into two values, when there were many more. I thought this would just give me the first two things from the list, but of course, it kept giving me the error mentioned above. I was unaware the return value was a list and thought it was a special object or something and I was basing my code off of my friends'.
I was thinking this error meant there were too many tables on the page and that it couldn't handle all of them, so I was asking for a way to do it without the method I was using. I probably should have stopped assuming things.
Now I know it returns a list and I can use this in a for loop or get a value from it with soup.findAll('table')[someNumber]
. I learned what unpacking was and how to use it, as well. Thanks everyone who helped.
Hopefully that clears things up, now that I know what I'm doing my question makes less sense than it did when I asked it, so I thought I'd just put a note here on what I was thinking.
EDIT 2:
This question is now pretty old, but I still see that I was never really clear about what I was doing.
If it helps anyone, I was attempting to unpack the findAll(...)
results, of which the amount of them I didn't know.
useless_table, table_i_want, another_useless_table = soup.findAll("table");
Since there weren't always the amount of tables I had guessed in the page, and all the values in the tuple need to be unpacked, I was receiving the ValueError
:
ValueError: too many values to unpack
So, I was looking for the way to grab the second (or whichever index) table in the tuple returned without running into errors about how many tables were used.
Going down. One of the important pieces of element in any piece of HTML document are tags, which may contain other tags/strings (tag's children). Beautiful Soup provides different ways to navigate and iterate over's tag's children.
A tag object in BeautifulSoup corresponds to an HTML or XML tag in the actual page or document. Tags contain lot of attributes and methods and two important features of a tag are its name and attributes.
To get the second table from the call soup.findAll('table')
, use it as a list, just index it:
secondtable = soup.findAll('table')[1]
Martjin Pieter's answer will make it work indeed. I had some experience with nested table
tag which broke my code when I just simply get the second table in the list without paying attention.
When you try to find_all
and get the nth element, there is a potential you will mess up, you had better locate the first element you want and make sure the n-th element is actually a sibling of that element instead of children.
find_next_sibling()
to secure your code Just in case you need it. I will list my code below(use recursive=FALSE).
import urllib2
from bs4 import BeautifulSoup
text = """
<html>
<head>
</head>
<body>
<table>
<p>Table1</p>
<table>
<p>Extra Table</p>
</table>
</table>
<table>
<p>Table2</p>
</table>
</body>
</html>
"""
soup = BeautifulSoup(text)
tables = soup.find('body').find_all('table')
print len(tables)
print tables[1].text.strip()
#3
#Extra Table # which is not the table you want without warning
tables = soup.find('body').find_all('table', recursive=False)
print len(tables)
print tables[1].text.strip()
#2
#Table2 # your desired output
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With