Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Javascript DOM, get node text without losing spacing info

Tags:

javascript

dom

I am using javascript and want to traverse the HTML tree, getting all the text as it appears to the user. However, I am losing spacing information.

Let's say I have two docs:

<html>XXX<p>YY    YY</p><html>

<html>XXX<p>YY&nbsp;&nbsp;&nbsp;YY</p><html>

The first one will appear with 1 space between the Ys. The second will have 3 spaces. However, if I traverse the tree and, for each #text node, use:

text = node.nodeValue;

then the text for both nodes will have 3 spaces. I no longer know which one has the "real" nbsp spaces. I can use node.innerHTML for the p elements, which will show the nbsp, but I don't think that I can use innerHTML to get just the XXX text (without some kind of text subtraction).

I could just get innerHTML of the whole document and parse that. However, I also need to get the computed style of each element, which I am going to get using

window.getComputedStyle(theElement).getPropertyValue("text-align");

So, I will be traversing each node. Also, innerHTML shows the source as is, while traversing the nodes "fixes" html errors, adding end tags, etc. That's a good thing and something I'd like to keep.

like image 501
user984003 Avatar asked Mar 08 '12 14:03

user984003


1 Answers

What if you test by charCode? I believe a regular space is 32, while &nbsp; is 160.

like image 181
bfavaretto Avatar answered Oct 10 '22 05:10

bfavaretto