I saw the line below in code for a DOM parser at this tutorial. <pre class="prettyprint"><code>doc.getDocumentElement().normalize(); </code></pre> Why do we do this normalization ? I read the docs but I could not understand a word. <blockquote> Puts all Text nodes in the full depth of the sub-tree underneath this Node </blockquote> Okay, then can someone show me (preferably with a picture) what this tree looks like ? Can anyone explain me why normalization is needed? What happens if we don't normalize ?

The rest of the sentence is: <blockquote> where only structure (e.g., elements, comments, processing instructions, CDATA sections, and entity references) separates Text nodes, i.e., there are neither adjacent Text nodes nor empty Text nodes. </blockquote> This basically means that the following XML element <pre class="prettyprint"><code><foo>hello wor ld</foo> </code></pre> could be represented like this in a denormalized node: <pre class="prettyprint"><code>Element foo Text node: "" Text node: "Hello " Text node: "wor" Text node: "ld" </code></pre> When normalized, the node will look like this <pre class="prettyprint"><code>Element foo Text node: "Hello world" </code></pre> And the same goes for attributes: <code><foo bar="Hello world"/></code>, comments, etc.

In simple, Normalisation is Reduction of Redundancies. Examples of Redundancies: a) white spaces outside of the root/document tags(...<document></document>...) b) white spaces within start tag (<...>) and end tag (</...>) c) white spaces between attributes and their values (ie. spaces between key name and =") d) superfluous namespace declarations e) line breaks/white spaces in texts of attributes and tags f) comments etc...

Normalization in DOM parsing with java - how does it work?

Tags:

java

dom

xml

I saw the line below in code for a DOM parser at this tutorial.

doc.getDocumentElement().normalize();

Why do we do this normalization ?
I read the docs but I could not understand a word.

Puts all Text nodes in the full depth of the sub-tree underneath this Node

Okay, then can someone show me (preferably with a picture) what this tree looks like ?

Can anyone explain me why normalization is needed?
What happens if we don't normalize ?

919

asked Dec 09 '12 10:12

Apple Grinder

2 Answers

The rest of the sentence is:

where only structure (e.g., elements, comments, processing instructions, CDATA sections, and entity references) separates Text nodes, i.e., there are neither adjacent Text nodes nor empty Text nodes.

This basically means that the following XML element

<foo>hello  wor ld</foo>

could be represented like this in a denormalized node:

Element foo     Text node: ""     Text node: "Hello "     Text node: "wor"     Text node: "ld"

When normalized, the node will look like this

Element foo     Text node: "Hello world"

And the same goes for attributes: <foo bar="Hello world"/>, comments, etc.

188

answered Sep 18 '22 18:09

JB Nizet

In simple, Normalisation is Reduction of Redundancies.
Examples of Redundancies:
a) white spaces outside of the root/document tags(...<document></document>...)
b) white spaces within start tag (<...>) and end tag (</...>)
c) white spaces between attributes and their values (ie. spaces between key name and =")
d) superfluous namespace declarations
e) line breaks/white spaces in texts of attributes and tags
f) comments etc...

answered Sep 18 '22 18:09

AVA

Related questions
                            
                                How is the java memory pool divided?
                            
                                Getting "NoSuchMethodError: org.hamcrest.Matcher.describeMismatch" when running test in IntelliJ 10.5
                            
                                Setting active profile and config location from command line in spring boot
                            
                                Android Split string
                            
                                Delete all files in directory (but not directory) - one liner solution
                            
                                How to run a JAR file
                            
                                What does java.lang.Thread.interrupt() do?
                            
                                Initializing multiple variables to the same value in Java
                            
                                Connection Java - MySQL : Public Key Retrieval is not allowed
                            
                                What does it mean: The serializable class does not declare a static final serialVersionUID field? [duplicate]
                            
                                Why doesn't java.util.Set have get(int index)?
                            
                                How is "mvn clean install" different from "mvn install"?
                            
                                Float and double datatype in Java
                            
                                Regex for matching something if it is not preceded by something else
                            
                                Accept server's self-signed ssl certificate in Java client
                            
                                Do Java arrays have a maximum size?
                            
                                How to read a text-file resource into Java unit test? [duplicate]
                            
                                When to choose checked and unchecked exceptions
                            
                                Lombok annotations do not compile under Intellij idea [duplicate]
                            
                                What is the difference between JOIN and JOIN FETCH when using JPA and Hibernate

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With