Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JSDOM's querySelectorAll returned too many XML elements

Tags:

node.js

jsdom

Node.js version: 10.15.3 jsdom version: 15.1.0

const fs = require('fs');
const jsdom = require("jsdom");
const { JSDOM } = jsdom;

const xmlFile = fs.readFileSync("question.xml", "utf8");
const dom = new JSDOM(xmlFile);
const all = dom.window.document.querySelectorAll("S");
console.log(all);
<?xml version="1.0" encoding="utf-8"?>
    <Foo>
        <FooBar>
            <S a="string1" b="string2" c="string3"/>
        </FooBar>
    </Foo>
    <Foo>
        <FooBar>
            <S a="string1" b="string2"/>
            <S a="string1"/>
        </FooBar>
    </Foo>

querySelectorAll("S") returns 7 HTML elements when there are clearly only 3. What makes it even stranger is if I rename the xml elements from S to F, it works correctly and querySelectorAll("F") finds only 3 elements. What is the cause of this inconsistency?

like image 662
miyagisan Avatar asked Apr 15 '26 11:04

miyagisan


1 Answers

By default, JSDOM interprets the markup you give to it as HTML. So it interprets your XML as HTML, and you get funky results. Remember that the HTML specification provides rules about how to make sense of broken HTML so when JSDOM reads your XML, it applies the rules and tries to get some sensible document out of it. If I take your XML and your code but I add

console.log(dom.window.document.documentElement.innerHTML);

just after the line that assigns dom, I get this serialized HTML:

<head></head><body><foo>
        <foobar>
            <s a="string1" b="string2" c="string3">
        </s></foobar><s a="string1" b="string2" c="string3">
    </s></foo><s a="string1" b="string2" c="string3">
    <foo>
        <foobar>
            <s a="string1" b="string2">
            <s a="string1">
        </s></s></foobar><s a="string1" b="string2"><s a="string1">
    </s></s></foo><s a="string1" b="string2"><s a="string1">
</s></s></s></body>

Look at what happens to s. (Reminder: HTML element names are case-insensitive so S and s are the same HTML element.)

By the way, the reason you get different behavior with S by opposition to F is because S is an actual HTML element, whereas F is not. So JSDOM applies different rules to S than F when it tries to make sense of your document as HTML.

In order for JSDOM to interpret your document as XML, you can do this:

const dom = new JSDOM(xmlFile, { contentType: "text/xml" });

But note that your document is not well-formed XML because it has more than one root element. The XML specification does not provide any rule for making sense of documents that are not well-formed. A document which is not well-formed is essentially not XML. So JSDOM will just reject your document. You need to edit it so that it has only one root element. For instance, this would work:

<?xml version="1.0" encoding="utf-8"?>
<doc>
   <Foo>
        <FooBar>
            <S a="string1" b="string2" c="string3"/>
        </FooBar>
    </Foo>
    <Foo>
        <FooBar>
            <S a="string1" b="string2"/>
            <S a="string1"/>
        </FooBar>
    </Foo>
</doc>

I've just wrapped the two Foo elements in a doc element which forms the single root required by XML.

like image 80
Louis Avatar answered Apr 18 '26 02:04

Louis



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!