Is there a better approach to parse an invalid HTML then applying Tidy on it? Side Note : There are some situation when you can't have Tidy available. Regexp is also not recommended I understood for parsing html.

I would try something like this: http://php.net/manual/en/domdocument.loadhtml.php From that page: <blockquote> The function parses the HTML contained in the string source. Unlike loading XML, HTML does not have to be well-formed to load. This function may also be called statically to load and create a DOMDocument object. </blockquote>

Best way to parse an invalid HTML in PHP

2 Answers

I would try something like this: http://php.net/manual/en/domdocument.loadhtml.php

From that page:

The function parses the HTML contained in the string source. Unlike loading XML, HTML does not have to be well-formed to load. This function may also be called statically to load and create a DOMDocument object.

102

answered Sep 18 '22 02:09

Rob

SimpleHTMLDOM is known to be more lenient than PHP's native DOM functions.

answered Sep 17 '22 02:09

Pekka

Related questions
                            
                                result of prepared select statement as array
                            
                                Kerberos Authentication in PHP
                            
                                How to implement Gmail OAuth API to send email (especially via SMTP)?
                            
                                Statistical analysis for PHP [closed]
                            
                                How do I handle NULL values in a mysql SELECT ... OUTFILE statement in conjunction with FIELDS ESCAPED BY? NULL values are currently being truncated
                            
                                How to create an Email Account in Cpanel via PHP? [closed]
                            
                                How to send the browser to an error page if part of the response has been sent (chunked)
                            
                                2 different small query vs 1 query with subquery
                            
                                using php __DIR__
                            
                                composite identifier, but uses an ID generator other than manually assigning + Symfony2
                            
                                How to detect if a mobile device is emulated by Google Chrome? [closed]
                            
                                OPTIONS 405 (Method Not Allowed)
                            
                                Soft deleting / detaching and restoring / attaching relationships with composite keys
                            
                                How to guarantee a specified file is a device on BSD/Linux from PHP?
                            
                                Working out how Yii2 finds theme resources when using fallbacks
                            
                                PChart linear chart image quality
                            
                                Testing subscription renewals on Stripe
                            
                                Registration in Notification Hub with PHP Backend
                            
                                Dreamweaver extension to beautify PHP/JavaScript/jQuery code
                            
                                Split a sentence into separate words

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Best way to parse an invalid HTML in PHP

Tags:

html

php

parsing

johnlemon

People also ask

2 Answers

Rob

Pekka

Recent Activity

Donate For Us