Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting text inside tags from html document

I have an html document like this: https://dropmefiles.com/wezmb So I need to extract text inside tags <span id="1" and </span , but I don't know how. I'm trying and write this code:

from bs4 import BeautifulSoup

with open("10_01.htm") as fp:
    soup = BeautifulSoup(fp,features="html.parser")
    for a in soup.find_all('span'):
      print (a.string)

But it extract all information from all 'span' tags. So, how can i extract text inside tags <span id="1" and </span in Python?

like image 866
Terry Avatar asked Jan 22 '26 17:01

Terry


1 Answers

What you need is the .contents function. documentation

Find the span <span id = "1"> ... </span> using

for x in soup.find(id = 1).contents:
    print(x)

OR

x = soup.find(id = 1).contents[0] # since there will only be one element with the id 1.
print(x)

This will give you :


10

that is, an empty line followed by 10 followed by another empty line. This is because the string in the HTML is actually like that and prints 10 in a new line, as you can also see in the HTML that 10 has its separate line.
The string will correctly be '\n10\n'.

If you want just x = '10' from x = '\n10\n' you can do : x = x[1:-1] since '\n' is a single character. Hope this helped.

like image 136
pu239 Avatar answered Jan 25 '26 06:01

pu239



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!