Best way to convert HTML to plaintext using Python

Tags:

I'm working on a project that involves converting a large amount of HTML content to plain/text. I have a custom-written module that does the job OK, but I'm wondering if there's some standard tools to help get the job done.

479

asked Nov 03 '09 15:11

Brian Tol

2 Answers

Html2Text seems to be a good option

answered Oct 04 '22 10:10

Chris Ballance

Here's a python library which does HTML parsing:

lxml.html

BeautifulSoup is another option.

answered Oct 04 '22 12:10

tcarobruce

Related questions
                            
                                cannot import name shared_memory
                            
                                Cannot set field value in assignment expression
                            
                                Python list.clear() time and space complexity?
                            
                                How does one connect a Jupyter Kernel to VS Code if one does not connect automatically?
                            
                                single-sourcing package version for setup.cfg Python projects
                            
                                Plotly: How to add a horizontal line to a line graph?
                            
                                Plotly: How to style a plotly figure so that it doesn't display gaps for missing dates?
                            
                                Maximum volume inscribed ellipsoid in a polytope/set of points
                            
                                Python Flask automatically generated Swagger/OpenAPI 3.0 [closed]
                            
                                How to find indices of first two elements in a list that are any of the elements in another list?
                            
                                Get all members discord.py
                            
                                Tiny python executable?
                            
                                Python 3 development and distribution challenges
                            
                                How to edit raw PCM audio data without an audio library?
                            
                                How to generate a file with DDL in the engine's SQL dialect in SQLAlchemy?
                            
                                django auth User truncating email field
                            
                                Cleaning up an internal pysqlite connection on object destruction
                            
                                In GTK, how do I get the actual size of a widget on screen?
                            
                                Generate unique ID for python object based on its attributes
                            
                                Constant instance variables?

Best way to convert HTML to plaintext using Python

Tags:

python

html

plaintext

Brian Tol

People also ask

2 Answers

Chris Ballance

tcarobruce

Recent Activity

Donate For Us