How to ignore whitespace while reading a file to produce an XML DOM

Q: Does XML ignore white space?

In XML documents, there are two types of whitespace: Significant whitespace is part of the document content and should be preserved. Insignificant whitespace is used when editing XML documents for readability. These whitespaces are typically not intended for inclusion in the delivery of the document.

Q: How do you handle a space in XML?

XML ignores the first sequence of white space immediately after the opening tag and the last sequence of white space immediately before the closing tag. XML translates non-space characters (tab and new-line) into a space character and consolidates all multiple space characters into a single space.

Q: What do you mean by whitespace in XML?

White space is used in XML for readability and has no business meaning. Input XML messages can include line breaks, blanks lines, and spaces between tags (all shown in the following example). If you process XML messages that contain any of these spaces, they are represented as elements in the message tree.

Tags:

java

xml

whitespace

I'm trying to read a file to produce a DOM Document, but the file has whitespace and newlines and I'm trying to ignore them, but I couldn't:

DocumentBuilderFactory docfactory=DocumentBuilderFactory.newInstance();
docfactory.setIgnoringElementContentWhitespace(true);

I see in Javadoc that setIgnoringElementContentWhitespace method operates only when the validating flag is enabled, but I haven't the DTD or XML Schema for the document.

What can I do?

Update

I don't like the idea of introduce mySelf < !ELEMENT... declarations and i have tried the solution proposed in the forum pointed by Tomalak, but it doesn't work, i have used java 1.6 in an linux environment. I think if no more is proposed i will make a few methods to ignore whitespace text nodes

675

asked Oct 23 '08 10:10

Telcontar

1 Answers

‘IgnoringElementContentWhitespace’ is not about removing all pure-whitespace text nodes, only whitespace nodes whose parents are described in the schema as having ELEMENT content — that is to say, they only contain other elements and never text.

If you don't have a schema (DTD or XSD) in use, element content defaults to MIXED, so this parameter will never have any effect. (Unless the parser provides a non-standard DOM extension to treat all unknown elements as containing ELEMENT content, which as far as I know the ones available for Java do not.)

You could hack the document on the way into the parser to include the schema information, for example by adding an internal subset to the < !DOCTYPE ... [...] > declaration containing < !ELEMENT ... > declarations, then use the IgnoringElementContentWhitespace parameter.

Or, possibly easier, you could just strip out the whitespace nodes, either in a post-process, or as they come in using an LSParserFilter.

166

answered Oct 06 '22 00:10

bobince

Related questions
                            
                                How to configure Unity 2017.4 to target Android and avoid build failures on OSX?
                            
                                Process HTML file using Thymeleaf in Web based Scopes of Spring and store the processed template as String
                            
                                Gradle 4.7 targetCompatibility for JDK 10
                            
                                Error after upgrading Android Studio NDK: CMAKE_C_COMPILER and CMAKE_CXX_COMPILER not set
                            
                                Enable HTTP Strict Transport Security (HSTS) with spring boot application
                            
                                How to install Oracle Java 8 using Ansible [duplicate]
                            
                                Why is this toString() method causing StackOverFlowError? [duplicate]
                            
                                I can't set up JDK on Visual Studio Code
                            
                                What is the equivalent format string of DateTimeFormatter.ISO_OFFSET_DATE_TIME?
                            
                                JMeter : java.net.NoRouteToHostException: Cannot assign requested address (Address not available)
                            
                                Configure Spring for CORS
                            
                                JPA Repository.findById() returns null but the value is exist on db
                            
                                Firebase push notification `deprecatedEndPoint` error
                            
                                How can I suppress Javac warning about preview features?
                            
                                Can't configure antMatchers after anyRequest (Multiple antMatcher)
                            
                                Maven Build Failure - Compiler Plugin
                            
                                Best method to parse various custom XML documents in Java
                            
                                How to show/hide a column at runtime?
                            
                                Java: Scripting language (macro) to embed into a Java desktop application
                            
                                How can I access a public static member of a Java class from ColdFusion?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With