I'm working on a project that involves converting a large amount of HTML content to plain/text. I have a custom-written module that does the job OK, but I'm wondering if there's some standard tools to help get the job done.
This can be achieved with the help of html. escape() method(for Python 3.4+), we can convert the ASCII string into HTML script by replacing ASCII characters with special characters by using html. escape() method. By this method we can decode the HTML entities into text.
This method is useful if you're bulk converting a bunch of HTML files into Markdown – just iterate over a list of HTML files and save them to Markdown files. from markdownify import markdownify file = open("./hello-world. html", "r"). read() html = markdownify(file, heading_style="ATX") print(html) ## ## Hello, World!
Html2Text seems to be a good option
Here's a python library which does HTML parsing:
BeautifulSoup is another option.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With