Extracting text inside tags from html document

Question

I have an html document like this: https://dropmefiles.com/wezmb So I need to extract text inside tags <span id="1" and </span , but I don't know how. I'm trying and write this code:

from bs4 import BeautifulSoup

with open("10_01.htm") as fp:
    soup = BeautifulSoup(fp,features="html.parser")
    for a in soup.find_all('span'):
      print (a.string)

But it extract all information from all 'span' tags. So, how can i extract text inside tags <span id="1" and </span in Python?

pu239 · Accepted Answer

What you need is the .contents function. documentation

Find the span <span id = "1"> ... </span> using

for x in soup.find(id = 1).contents:
    print(x)

OR

x = soup.find(id = 1).contents[0] # since there will only be one element with the id 1.
print(x)

This will give you :

that is, an empty line followed by 10 followed by another empty line. This is because the string in the HTML is actually like that and prints 10 in a new line, as you can also see in the HTML that 10 has its separate line.
The string will correctly be ' 10 '.

If you want just x = '10' from x = ' 10 ' you can do : x = x[1:-1] since ' ' is a single character. Hope this helped.

Extracting text inside tags from html document

Tags:

python

html

beautifulsoup

extract

tags

Terry

1 Answers

pu239

Recent Activity

Donate For Us

Extracting text inside tags from html document

Tags:

python

html

beautifulsoup

extract

tags

Terry

1 Answers

pu239

Related questions

Recent Activity

Donate For Us