Example HTML
<h2 id="name">
ABC
<span class="numbers">123</span>
<span class="lower">abc</span>
</h2>
I can get the numbers with something like:
soup.select('#name > span.numbers')[0].text
How do I get the text ABC
using BeautifulSoup and the select
function?
What about in this case?
<div id="name">
<div id="numbers">123</div>
ABC
</div>
In the first case, get the previous sibling:
soup.select_one('#name > span.numbers').previous_sibling
In the second case, get the next sibling:
soup.select_one('#name > #numbers').next_sibling
Note that I assume that it is intentional that here you have the numbers
as an id
value and the tag is div
instead of span
. Hence, I've adjusted the CSS selector.
To cover both cases, you can go to the parent of the tag and find the non-empty text node in a non-recursive mode:
parent = soup.select_one('#name > .numbers,#numbers').parent
print(parent.find(text=lambda text: text and text.strip(), recursive=False).strip())
Note the change in the selector - we are asking to match either numbers
id or numbers
class.
Though, I have a feeling that this universal solution would not be quite reliable because, for starters, I don't know what your real inputs could be.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With