Remove HTML block in Python

Question

I'd like to know if there's a library or some method in Python to extract an element from an HTML document. For example:

I have this document:

<html>
      <head>
          ...
      </head>
      <body>
          <div>
           ...
          </div>
      </body>
</html>

I want to remove the <div></div> tag block along with the block contents from the document and then it'll be like that:

<html>
    <head>
     ...
    </head>
    <body>
    </body>
</html>

Wso · Accepted Answer

You don't need a library for this. Just use built in string methods.

def removeOneTag(text, tag):
    return text[:text.find("<"+tag+">")] + text[text.find("</"+tag+">") + len(tag)+3:]

This will remove everything in-between the first opening and closing tag. So your input in the example would be something like...

    x = """<html>
    <head>
      ...
    </head>
    <body>
       <div>
         ...
       </div>
    </body>
</html>"""
print(removeOneTag(x, "div"))

Then if you wanted to remove ALL the tags...

while(tag in x):
    x = removeOneTag(x, tag)

Remove HTML block in Python

Tags:

python

html

parsing

JefersonM

1 Answers

Wso

Recent Activity

Donate For Us

Remove HTML block in Python

Tags:

python

html

parsing

JefersonM

1 Answers

Wso

Related questions

Recent Activity

Donate For Us