I have the following xml document: <pre class="prettyprint"><code><?xml version="1.0" encoding="UTF-8"?> <root> <data> <child1>&#160;Well, some spaces and nbsps &#160;</child1> <child2>&#160; some more &#160; or whatever </child2> <child3> a nice text</child3> <child4>how to get rid of all the nasty spaces&#160; ? </child4> </data> </root> </code></pre> I have to remove all non-breakable spaces, concatenate the text and nomalize it. My xpath query (it works fine for concatenation and normalization - I have inserted the replacement with 'x' only for test purposes): <pre class="prettyprint"><code>normalize-space(replace(string-join(//data/*,' '),'&#160;','x')) </code></pre> My problem: I can't find the <code>"&#160;"</code>-whitespace to replace it. Looking forward to your answers,

The string value of an element node is defined to be the concatenation of all its descendant text nodes, so in an XSLT transformation <pre class="prettyprint"><code>normalize-space(translate(//data, '&#160;', '')) </code></pre> would do what you require, assuming your document only contains one <code>data</code> element - if there is more than one <code>data</code> element then this expression will only extract and normalize the text of the first <code>data</code> element in the document. If you are using the XPath expression somewhere other than in an XSLT file then you will need to represent the non-break space character differently. The above example works because the XML parser converts the <code>&#160;</code> character reference into a non-break space character when reading the <code>.xsl</code> file, so the XPath expression parser sees the character, not the reference. In Java, for example, I could say <pre class="prettyprint"><code>XPath.evaluate("normalize-space(translate(//data, '\u00A0', ''))", contextNode) </code></pre> because <code>\u00A0</code> is the way to represent the nbsp character in a Java string literal. If you are using another language you need to find the right way to represent this character in that language, or if you're using XPath 2.0 you could use the <code>codepoints-to-string</code> function: <pre class="prettyprint"><code>normalize-space(translate(//data, codepoints-to-string(160), '')) </code></pre>

Remove non-breakable whitespaces using xpath

Tags:

xml

xpath

I have the following xml document:

<?xml version="1.0" encoding="UTF-8"?>
<root>
<data>
<child1>&#160;Well, some  spaces and nbsps  &#160;</child1>
<child2>&#160; some more                  &#160;  or whatever          </child2>
<child3>         a nice text</child3>
<child4>how                              to get rid of all the nasty spaces&#160;          ?                                  </child4>
</data>
</root>

I have to remove all non-breakable spaces, concatenate the text and nomalize it.

My xpath query (it works fine for concatenation and normalization - I have inserted the replacement with 'x' only for test purposes):

normalize-space(replace(string-join(//data/*,' '),'&#160;','x'))

My problem: I can't find the " "-whitespace to replace it.

Looking forward to your answers,

572

asked Nov 05 '12 17:11

user1800825

1 Answers

The string value of an element node is defined to be the concatenation of all its descendant text nodes, so in an XSLT transformation

normalize-space(translate(//data, '&#160;', ''))

would do what you require, assuming your document only contains one data element - if there is more than one data element then this expression will only extract and normalize the text of the first data element in the document.

If you are using the XPath expression somewhere other than in an XSLT file then you will need to represent the non-break space character differently. The above example works because the XML parser converts the   character reference into a non-break space character when reading the .xsl file, so the XPath expression parser sees the character, not the reference. In Java, for example, I could say

XPath.evaluate("normalize-space(translate(//data, '\u00A0', ''))", contextNode)

because \u00A0 is the way to represent the nbsp character in a Java string literal. If you are using another language you need to find the right way to represent this character in that language, or if you're using XPath 2.0 you could use the codepoints-to-string function:

normalize-space(translate(//data, codepoints-to-string(160), ''))

196

answered Sep 26 '22 21:09

Ian Roberts

Related questions
                            
                                Nested XML XSL for-each loop
                            
                                Find node by name by using Nokogiri
                            
                                Sort XML Nodes by Alpha.Numeric using C#
                            
                                Check JSON and XML is valid? c#
                            
                                cross domain issue with Jquery
                            
                                SPARQL: combining variables with literals
                            
                                Case Insensitive findall in Python ElementTree
                            
                                create XML doc from SQL query
                            
                                How determine if MSXML6 is installed in a system using Delphi?
                            
                                Select Xml Node using Linq to XML
                            
                                Using XmlTextReader
                            
                                C++ Object to XML for communication
                            
                                exporting multiple access tables to single XML
                            
                                Add prefixes and namespaces to XML serialization
                            
                                MySQL use ExtractValue(XML, 'Value/Values') to get all multiple values (split one column into rows)
                            
                                Solve security issue parsing xml using SAX parser
                            
                                CAML query case sensitive search
                            
                                Java Dom parser reports wrong number of child nodes
                            
                                Reading sdmx-xml files into a dataframe in R
                            
                                How to add a xml node constructed from string in libxml2

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With