I'd like to know if there's a library or some method in Python to extract an element from an HTML document. For example:
I have this document:
<html>
<head>
...
</head>
<body>
<div>
...
</div>
</body>
</html>
I want to remove the <div></div> tag block along with the block contents from the document and then it'll be like that:
<html>
<head>
...
</head>
<body>
</body>
</html>
You don't need a library for this. Just use built in string methods.
def removeOneTag(text, tag):
return text[:text.find("<"+tag+">")] + text[text.find("</"+tag+">") + len(tag)+3:]
This will remove everything in-between the first opening and closing tag. So your input in the example would be something like...
x = """<html>
<head>
...
</head>
<body>
<div>
...
</div>
</body>
</html>"""
print(removeOneTag(x, "div"))
Then if you wanted to remove ALL the tags...
while(tag in x):
x = removeOneTag(x, tag)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With