Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP SAX parser for HTML?

I need HTML SAX (not DOM!) parser for PHP able to process even invalid HTML code. The reason i need it is to filter user entered HTML (remove all attributes and tags except allowed ones) and truncate HTML content to specified length.

Any ideas?

like image 213
Daniel Avatar asked May 03 '26 16:05

Daniel


1 Answers

SAX was made to process valid XML and fail on invalid markup. Processing invalid HTML markup requires keeping more state than SAX parsers typically keep.

I'm not aware of any SAX-like parser for HTML. Your best shot is to use to pass the HTML through tidy before and then use a XML parser, but this may defeat your purpose of using a SAX parser in the first place.

like image 93
Artefacto Avatar answered May 05 '26 06:05

Artefacto



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!