Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get text content of the entire document?

I am building Chrome extension which at some point should determine current page language. In order to do that, my plan is to extract text content of the page (or at least a part of it) and pass it to translation api. However I couldn't find any strait forward way to just get all textNodes of the document.

There is a backup plan which is to recursively analyze $('body').contents() until there is enough text content, but it feels a bit flaky. Perhaps there is a better way?


Note: Chrome extensions api allows your script to access user page dom as if it was the part of it.

like image 636
artemave Avatar asked Nov 20 '10 15:11

artemave


People also ask

How do I get text content in HTML?

Use the textContent property to get the text of an html element, e.g. const text = box. textContent . The textContent property returns the text content of the element and its descendants. If the element is empty, an empty string is returned.

What is textContent?

textContents is all text contained by an element and all its children that are for formatting purposes only. innerText returns all text contained by an element and all its child elements.

What can I use instead of innerHTML?

HTML specifies that a <script> tag inserted with innerHTML should not execute. For that reason, it is recommended that instead of innerHTML you use: Element.SetHTML() to sanitize the text before it is inserted into the DOM.


1 Answers

Javascript:

document.body.textContent
like image 157
mortalis Avatar answered Oct 10 '22 06:10

mortalis