Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select "Text" node using querySelector

I'm writing a parser that should extract "Extract This Text" from the following html:

<div class="a">
    <h1>some random text</h1>
    <div class="clear"></div>
    Extract This Text
    <p></p>
    <h2></h2>
</div>

I've tried to use:

document.querySelector('div.a > :nth-child(3)');

And even by using next sibling:

document.querySelector('div.a > :nth-child(2) + *');

But they both skips it and returns only the "p" element.

The only solution I see here is selecting the previous node and then using nextSibling to access it.

Can querySelector select text nodes at all?
Text node: https://developer.mozilla.org/en-US/docs/Web/API/Text

like image 749
icl7126 Avatar asked Feb 21 '19 14:02

icl7126


People also ask

How do you select text in node?

The textNodes of any element can be selected using jQuery by selecting all the nodes and using the filter() method to check the nodeType property. The required element is first selected using the jQuery selector. The contents() method is used on selected elements.

Does querySelector return a node?

querySelector() will return the first node element found in the document based on the selector. For example, in the preceding code, I pass a selector that will select all the <li> elements in CSS, but only the first one is returned. querySelector() is also defined on element nodes.

How do I select the first child in querySelector?

Use the querySelector() method to get the first child of specific type, e.g. document. querySelector('#parent > p:first-of-type') . The method returns the first element that matches the selector. If no element matches the provided selector, null is returned.


2 Answers

As already answered, CSS does not provide text node selectors and thus document.querySelector doesn't.

However, JavaScript does provide an XPath-parser by the method document.evaluate which features many more selectors, axises and operators, e.g. text nodes as well.

let result = document.evaluate(
  '//div[@class="a"]/div[@class="clear"]/following-sibling::text()[1]',
  document,
  null,
  XPathResult.STRING_TYPE
).stringValue;

console.log(result.trim());
<body>
  <div class="a">
    <h1>some random text</h1>
    <div class="clear"></div>
    Extract This Text
    <p></p>
    But Not This Text
    <h2></h2>
  </div>
</body>

// means any number of ancestor nodes.
/html/body/div[@class="a"] would address the node absolutely.

It should be mentioned that CSS queries work much more performant than the very powerful XPath evaluation. Therefore, avoid the excessive usage of document.evaluate when document.querySelectorAll works as well. Reserve it for the cases where you really need to parse the DOM by complex expressions.

like image 130
Quasimodo's clone Avatar answered Oct 01 '22 10:10

Quasimodo's clone


It can't, though my answer isn't that authoritative. ( You may have figure it out)

You can check out this select text node with CSS or Is there a CSS selector for text nodes.

Some verbose explaination(maybe useless, English is not my first language, sorry for some misusing of words or grammar.):

I was learning about ParentNode and since the querySelectorAll() method returning a NodeList, I was wondering if it could select text node. I tried but failed; googled and found this post.

Argument in querySelectorAll(selectors) or querySelector(selectors) is a DOMString containing one or more CSS selectors (of course no containing pseudo-element, otherwise the method would return null) which only apply to elements (not plain text).

like image 23
kiz Avatar answered Oct 01 '22 10:10

kiz