I saw the line below in code for a DOM parser at this tutorial.
doc.getDocumentElement().normalize();
Why do we do this normalization ?
I read the docs but I could not understand a word.
Puts all Text nodes in the full depth of the sub-tree underneath this Node
Okay, then can someone show me (preferably with a picture) what this tree looks like ?
Can anyone explain me why normalization is needed?
What happens if we don't normalize ?
DOM is part of the Java API for XML processing (JAXP). Java DOM parser traverses the XML file and creates the corresponding DOM objects. These DOM objects are linked together in a tree structure. The parser reads the whole XML structure into the memory.
The normal form is useful for operations that require a particular document tree structure and ensures that the XML DOM view of a document is identical when saved and reloaded.
Parsing means analyzing and converting a program into an internal format that a runtime environment can actually run, for example the JavaScript engine inside browsers. The browser parses HTML into a DOM tree.
The rest of the sentence is:
where only structure (e.g., elements, comments, processing instructions, CDATA sections, and entity references) separates Text nodes, i.e., there are neither adjacent Text nodes nor empty Text nodes.
This basically means that the following XML element
<foo>hello wor ld</foo>
could be represented like this in a denormalized node:
Element foo Text node: "" Text node: "Hello " Text node: "wor" Text node: "ld"
When normalized, the node will look like this
Element foo Text node: "Hello world"
And the same goes for attributes: <foo bar="Hello world"/>
, comments, etc.
In simple, Normalisation is Reduction of Redundancies.
Examples of Redundancies:
a) white spaces outside of the root/document tags(...<document></document>...)
b) white spaces within start tag (<...>) and end tag (</...>)
c) white spaces between attributes and their values (ie. spaces between key name and =")
d) superfluous namespace declarations
e) line breaks/white spaces in texts of attributes and tags
f) comments etc...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With