How to retrieve (not recursively) all children using BeautifulSoup (bs4) ?
<div class='body'><span>A</span><span><span>B</span></span><span>C</span></div>
I want to get blocks like this :
block1 : <span>A</span>
block2 : <span><span>B</span></span>
block3 : <span>C</span>
I'm doing this way :
for j in soup.find_all(True)[:1]:
if isinstance(j, NavigableString):
continue
if isinstance(j, Tag):
tags.append(j.name)
# Get siblings
for k in j.find_next_siblings():
# k is sibling of first element
Is there a cleaner way to do that?
To extract text that is directly under an element in Beautiful Soup use the find_all(text=True, recursive=False) method.
find() method The find method is used for finding out the first tag with the specified name or id and returning an object of type bs4. Example: For instance, consider this simple HTML webpage having different paragraph tags.
You can set the recursive
argument to False
if you want to select only direct descendants.
An example with the html you provided:
from bs4 import BeautifulSoup
html = "<div class='body'><span>A</span><span><span>B</span></span><span>C</span></div>"
soup = BeautifulSoup(html, "lxml")
for j in soup.div.find_all(recursive=False):
print(j)
<span>A</span>
<span><span>B</span></span>
<span>C</span>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With