Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BeautifulSoup extract top-level tags only [duplicate]

I'm doing some web-scraping with BeautifulSoup in Python 3.4.

Now I have encountered a problem during my learning: I'm trying to get a table rows from a webpage and I'm using find_all() to get them, but inside the table - there are more table with table rows inside of them! how can I get only the top-level/first-level general or specific elements of a tag in BeautifulSoup?

# Retrieves all the row ('tr') tags in table
my_table.find_all('tr')

By the way, this question is a duplicate of this question (only the programming language used over there is PHP): Extract only first level paragraphs from html

like image 679
coldnine Avatar asked Jun 19 '16 19:06

coldnine


1 Answers

Apparently there is an argument called recursive in the method find_all() and it is set by default to True.

Setting it to false, making the method to return only the top-level elements.

find_all('tr', recursive=False)
like image 179
coldnine Avatar answered Sep 17 '22 13:09

coldnine