Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove non-breakable whitespaces using xpath

Tags:

xml

xpath

I have the following xml document:

<?xml version="1.0" encoding="UTF-8"?>
<root>
<data>
<child1>&#160;Well, some  spaces and nbsps  &#160;</child1>
<child2>&#160; some more                  &#160;  or whatever          </child2>
<child3>         a nice text</child3>
<child4>how                              to get rid of all the nasty spaces&#160;          ?                                  </child4>
</data>
</root>

I have to remove all non-breakable spaces, concatenate the text and nomalize it.

My xpath query (it works fine for concatenation and normalization - I have inserted the replacement with 'x' only for test purposes):

normalize-space(replace(string-join(//data/*,' '),'&#160;','x'))

My problem: I can't find the "&#160;"-whitespace to replace it.

Looking forward to your answers,

like image 572
user1800825 Avatar asked Nov 05 '12 17:11

user1800825


People also ask

How do you normalize space in XPath?

Normalize space XPath. Eliminates redundant spaces from the supplied string. normalize-space function is a part of string function in XPath. The normalize-space function takes a string argument and matches the string present in the document/webpage.

How do I remove whitespace from a string in Linux?

The tr command reads a byte stream from standard input (stdin), translates or deletes characters, then writes the result to standard output (stdout). We can use the tr command’s -d option – for deleting specific characters – to remove whitespace characters. The syntax is: tr -d SET1

How to check whitespace nodes in XPath?

if you want to check whitespace nodes it's much harder, as you will generally have a nodelist result set, and most xpath functions, like match or replace, only operate one node. So you may use xpath to retrieve a container, or a list of text nodes, and then process it with another language. (java, php, python, perl for instance).

How do I remove blank space from a string?

After the string then use: StringName.TrimEnd Before/After or in between of the string then use: StringName.Trim were you able to trim out the blank space buddy @Rafaeloneil You can use regular expression to remove whitespace characters. New Regex ("\s").Replace (inputString, "") you can just take the digits from your string.


1 Answers

The string value of an element node is defined to be the concatenation of all its descendant text nodes, so in an XSLT transformation

normalize-space(translate(//data, '&#160;', ''))

would do what you require, assuming your document only contains one data element - if there is more than one data element then this expression will only extract and normalize the text of the first data element in the document.

If you are using the XPath expression somewhere other than in an XSLT file then you will need to represent the non-break space character differently. The above example works because the XML parser converts the &#160; character reference into a non-break space character when reading the .xsl file, so the XPath expression parser sees the character, not the reference. In Java, for example, I could say

XPath.evaluate("normalize-space(translate(//data, '\u00A0', ''))", contextNode)

because \u00A0 is the way to represent the nbsp character in a Java string literal. If you are using another language you need to find the right way to represent this character in that language, or if you're using XPath 2.0 you could use the codepoints-to-string function:

normalize-space(translate(//data, codepoints-to-string(160), ''))
like image 196
Ian Roberts Avatar answered Sep 26 '22 21:09

Ian Roberts