Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can NSXMLParser Be Use To Parse HTML [duplicate]

Possible Duplicate:
Using an NSXMLParser to parse HTML

I have used NSXMLParser to parse xml files and RSS feeds. What i am confused about is that whether NSXMLParser is only for xml or can we use it to parse html as well. From a little searching on the Internet, i am assuming that some people use it for parsing html.

But are there any limitations or disadvantages of using NSXMLParser with html ?

like image 798
Jessica Avatar asked Feb 07 '26 05:02

Jessica


1 Answers

If you HTML document is well formed xhtml, then it will work. As a guess, you will not be working with well formed xhtml, as it's rare in the real world.

HTML (including HTML 4 and 5) is not well formed XML and will not be successfully parsed by an XML parser.

Consider the following sample:

<HTML>
<HEAD>
<META http-equiv=content-type content="text/html; charset=UTF-8">
<TITLE>Sample Document</TITLE>
</HEAD>
<BODY>
<H1>Sample Document</h1>
<P>This document will <strong><em>fail</strong></em> as XML.
</BODY>
</HTML>

In the above document, content-type is not in quotes (<META http-equiv=content-type …), <H1> and </h1> are different cases, <P> does not have an end tag, and strong and em are not nested correctly. This is valid HTML but invalid XML.

like image 192
Jeffery Thomas Avatar answered Feb 12 '26 15:02

Jeffery Thomas



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!