Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does BeautifulSoup find_all() preserve tag order?

I wish to use BeautifulSoup to parse some HMTL. I have a table with several rows. I'm trying to find a row that meets certain conditions (certain attribute values) and use the index of that row later on in my code.

The question is: does find_all() preserve the order of my rows in the result set that it returns?

I didn't find this in the docs and Googling got me only to this answer:

'BeautifulSoup tags don't track their order in the page, no.'

but he does not say where he got that information from.

I'd be happy with an answer, but even more happy with a pointer to some documentation that explains this.

Edit: dstudeba pointed me in the direction of this 'workaround' using next_sibling.

from bs4 import BeautifulSoup
soup = BeautifulSoup(open('./mytable.html'), 'html.parser')
row = soup.find('tr', {'class':'something', 'someattr':'somevalue'})
myvalues = []
while True:
    cell = row.find('td', {'someattr':'cellspecificvalue'})
    myvalues.append(cell.get_text())
    row = row.find_next_sibling('tr', {'class':'something', 'someattr':'somevalue'})
    if not row:
        break

This gets me the cell contents I need in the order they appear in my html file.

However I'd still like to know where in the BeautifulSoup docs I could find whether find_all() preserves order or not. This is why I'm not accepting dstudeba's answer. (my upvote doesn't show, not enough rep yet :P)

like image 692
Wil Koetsier Avatar asked Nov 11 '15 16:11

Wil Koetsier


People also ask

What does Find_all return BeautifulSoup?

Beautiful Soup's find_all(~) method returns a list of all the tags or strings that match a particular criteria.

Is tag editable in BeautifulSoup?

string” with tag. You can replace the string with another string but you can't edit the existing string.

How do you find multiple tags in BeautifulSoup?

In order to use multiple tags or elements, we have to use a list or dictionary inside the find/find_all() function. find/find_all() functions are provided by a beautiful soup library to get the data using specific tags or elements. Beautiful Soup is the python library for scraping data from web pages.

What is Find () method in BeautifulSoup?

find() method The find method is used for finding out the first tag with the specified name or id and returning an object of type bs4. Example: For instance, consider this simple HTML webpage having different paragraph tags.


1 Answers

It is my experience that find_all does preserve order. However to make sure you can use the find_all_next method which uses the find_next method which will preserve the order. Here is a link to the documentation.

like image 110
dstudeba Avatar answered Sep 28 '22 04:09

dstudeba