It seems that all major browsers implement the DOMParser API so that XML can be parsed into a DOM and then queried using XPath, getElementsByTagName, etc...
However, detecting parsing errors seems to be trickier. DOMParser.prototype.parseFromString
always returns a valid DOM. When a parsing error occurs, the returned DOM contains a <parsererror>
element, but it's slightly different in each major browser.
Sample JavaScript:
xmlText = '<root xmlns="http://default" xmlns:other="http://other"><child><otherr:grandchild/></child></root>'; parser = new DOMParser(); dom = parser.parseFromString(xmlText, 'application/xml'); console.log((new XMLSerializer()).serializeToString(dom));
Result in Opera:
DOM's root is a <parsererror>
element.
<?xml version="1.0"?><parsererror xmlns="http://www.mozilla.org/newlayout/xml/parsererror.xml">Error<sourcetext>Unknown source</sourcetext></parsererror>
Result in Firefox:
DOM's root is a <parsererror>
element.
<?xml-stylesheet href="chrome://global/locale/intl.css" type="text/css"?> <parsererror xmlns="http://www.mozilla.org/newlayout/xml/parsererror.xml">XML Parsing Error: prefix not bound to a namespace Location: http://fiddle.jshell.net/_display/ Line Number 1, Column 64:<sourcetext><root xmlns="http://default" xmlns:other="http://other"><child><otherr:grandchild/></child></root> ---------------------------------------------------------------^</sourcetext></parsererror>
Result in Safari:
The <root>
element parses correctly but contains a nested <parsererror>
in a different namespace than Opera and Firefox's <parsererror>
element.
<root xmlns="http://default" xmlns:other="http://other"><parsererror xmlns="http://www.w3.org/1999/xhtml" style="display: block; white-space: pre; border: 2px solid #c77; padding: 0 1em 0 1em; margin: 1em; background-color: #fdd; color: black"><h3>This page contains the following errors:</h3><div style="font-family:monospace;font-size:12px">error on line 1 at column 50: Namespace prefix otherr on grandchild is not defined </div><h3>Below is a rendering of the page up to the first error.</h3></parsererror><child><otherr:grandchild/></child></root>
Am I missing a simple, cross-browser way of detecting if a parsing error occurred anywhere in the XML document? Or must I query the DOM for each of the possible <parsererror>
elements that different browsers might generate?
If the XML parser detects an error in the XML document during parsing, message RNX0351 will be issued. From the message, you can get the specific error code associated with the error, as well as the offset in the document where the error was discovered.
The most common cause is encoding errors. There are several basic approaches to solving this: escaping problematic characters ( < becomes < , & becomes & , etc.), escaping entire blocks of text with CDATA sections, or putting an encoding declaration at the start of the feed.
All major browsers have a built-in XML parser to access and manipulate XML.
This is the best solution I've come up with.
I attempt to parse a string that is intentionally invalid XML and observe the namespace of the resulting <parsererror>
element. Then, when parsing actual XML, I can use getElementsByTagNameNS
to detect the same kind of <parsererror>
element and throw a Javascript Error
.
// My function that parses a string into an XML DOM, throwing an Error if XML parsing fails function parseXml(xmlString) { var parser = new DOMParser(); // attempt to parse the passed-in xml var dom = parser.parseFromString(xmlString, 'application/xml'); if(isParseError(dom)) { throw new Error('Error parsing XML'); } return dom; } function isParseError(parsedDocument) { // parser and parsererrorNS could be cached on startup for efficiency var parser = new DOMParser(), errorneousParse = parser.parseFromString('<', 'application/xml'), parsererrorNS = errorneousParse.getElementsByTagName("parsererror")[0].namespaceURI; if (parsererrorNS === 'http://www.w3.org/1999/xhtml') { // In PhantomJS the parseerror element doesn't seem to have a special namespace, so we are just guessing here :( return parsedDocument.getElementsByTagName("parsererror").length > 0; } return parsedDocument.getElementsByTagNameNS(parsererrorNS, 'parsererror').length > 0; };
Note that this solution doesn't include the special-casing needed for Internet Explorer. However, things are much more straightforward in IE. XML is parsed with a loadXML
method which returns true or false if parsing succeeded or failed, respectively. See http://www.w3schools.com/xml/xml_parser.asp for an example.
When I came here the first time, I upvoted original answer (by cspotcode), however, it does not work in Firefox. The resulting namespace is always "null" because of the structure of the produced document. I made a little research (check the code here). The idea is to use not
invalidXml.childNodes[0].namespaceURI
but
invalidXml.getElementsByTagName("parsererror")[0].namespaceURI
And then select "parsererror" element by namespace as in original answer. However, if you have a valid XML document with <parsererror>
tag in same namespace as used by browser, you end up with false alarm. So, here's a heuristic to check if your XML parsed successfully:
function tryParseXML(xmlString) { var parser = new DOMParser(); var parsererrorNS = parser.parseFromString('INVALID', 'application/xml').getElementsByTagName("parsererror")[0].namespaceURI; var dom = parser.parseFromString(xmlString, 'application/xml'); if(dom.getElementsByTagNameNS(parsererrorNS, 'parsererror').length > 0) { throw new Error('Error parsing XML'); } return dom; }
Why not implement exceptions in DOMParser?
Interesting thing worth mentioning in current context: if you try to get XML file with XMLHttpRequest
, parsed DOM will be stored in responseXML
property, or null
, if XML file content was invalid. Not an exception, not parsererror
or another specific indicator. Just null.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With