Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get all direct children of a BeautifulSoup Tag?

How to retrieve (not recursively) all children using BeautifulSoup (bs4) ?

<div class='body'><span>A</span><span><span>B</span></span><span>C</span></div>

I want to get blocks like this :

block1 : <span>A</span>
block2 : <span><span>B</span></span>
block3 : <span>C</span>

I'm doing this way :

for j in soup.find_all(True)[:1]:
            if isinstance(j, NavigableString):
                continue
            if isinstance(j, Tag):
                tags.append(j.name)
                # Get siblings
                for k in j.find_next_siblings():
                    # k is sibling of first element

Is there a cleaner way to do that?

like image 772
dbrrt Avatar asked Dec 31 '17 19:12

dbrrt


People also ask

How do you extract text from a tag in BeautifulSoup?

To extract text that is directly under an element in Beautiful Soup use the find_all(text=True, recursive=False) method.

What is Find () method in BeautifulSoup?

find() method The find method is used for finding out the first tag with the specified name or id and returning an object of type bs4. Example: For instance, consider this simple HTML webpage having different paragraph tags.


1 Answers

You can set the recursive argument to False if you want to select only direct descendants.
An example with the html you provided:

from bs4 import BeautifulSoup

html = "<div class='body'><span>A</span><span><span>B</span></span><span>C</span></div>"
soup = BeautifulSoup(html, "lxml") 
for j in soup.div.find_all(recursive=False):
    print(j)

<span>A</span>
<span><span>B</span></span>
<span>C</span>
like image 59
t.m.adam Avatar answered Sep 21 '22 11:09

t.m.adam