Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Normalization in DOM parsing with java - how does it work?

Tags:

java

dom

xml

I saw the line below in code for a DOM parser at this tutorial.

doc.getDocumentElement().normalize(); 

Why do we do this normalization ?
I read the docs but I could not understand a word.

Puts all Text nodes in the full depth of the sub-tree underneath this Node

Okay, then can someone show me (preferably with a picture) what this tree looks like ?

Can anyone explain me why normalization is needed?
What happens if we don't normalize ?

like image 919
Apple Grinder Avatar asked Dec 09 '12 10:12

Apple Grinder


People also ask

How does DOM parser work in Java?

DOM is part of the Java API for XML processing (JAXP). Java DOM parser traverses the XML file and creates the corresponding DOM objects. These DOM objects are linked together in a tree structure. The parser reads the whole XML structure into the memory.

Why do we normalize XML file in Java before accessing?

The normal form is useful for operations that require a particular document tree structure and ensures that the XML DOM view of a document is identical when saved and reloaded.

What is parsing in DOM?

Parsing means analyzing and converting a program into an internal format that a runtime environment can actually run, for example the JavaScript engine inside browsers. The browser parses HTML into a DOM tree.


2 Answers

The rest of the sentence is:

where only structure (e.g., elements, comments, processing instructions, CDATA sections, and entity references) separates Text nodes, i.e., there are neither adjacent Text nodes nor empty Text nodes.

This basically means that the following XML element

<foo>hello  wor ld</foo> 

could be represented like this in a denormalized node:

Element foo     Text node: ""     Text node: "Hello "     Text node: "wor"     Text node: "ld" 

When normalized, the node will look like this

Element foo     Text node: "Hello world" 

And the same goes for attributes: <foo bar="Hello world"/>, comments, etc.

like image 188
JB Nizet Avatar answered Sep 18 '22 18:09

JB Nizet


In simple, Normalisation is Reduction of Redundancies.
Examples of Redundancies:
a) white spaces outside of the root/document tags(...<document></document>...)
b) white spaces within start tag (<...>) and end tag (</...>)
c) white spaces between attributes and their values (ie. spaces between key name and =")
d) superfluous namespace declarations
e) line breaks/white spaces in texts of attributes and tags
f) comments etc...

like image 37
AVA Avatar answered Sep 18 '22 18:09

AVA