Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In XSLT can I tokenize on nothing?

Tags:

xslt

xslt-2.0

I need to convert the string 'abcdef' to its parts, 'a', 'b', 'c', 'd', 'e', 'f'. Stupidly I tried tokenize('abcdef', '') but of course that returns a FORX0003 error (The regular expression in tokenize() must not be one that matches a zero-length string).

I'm actually trying to convert the string finally to 'a/b/c/d/e/f' so any shortcuts that would get me directly to this state would also be useful.

(I'm using Saxon 9.3 for .NET platform)

like image 803
͢bts Avatar asked Dec 06 '11 12:12

͢bts


2 Answers

To get the desired character sequence from a string $str use the pair of functions string-to-code-points() and codepoints-to-string():

for $c in string-to-codepoints($str)
 return
    codepoints-to-string($c)

To get this character sequence joined with '/' as the join-string, simply apply string-join() on the above expression.

Here is a full code example:

<xsl:stylesheet version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="text"/>

 <xsl:template match="/">
     <xsl:sequence select=
      "string-join(
              for $c in string-to-codepoints('ABC')
              return
                 codepoints-to-string($c),
            '/'
                     )
      "/>
 </xsl:template>
</xsl:stylesheet>

produces the wanted character sequence:

A/B/C

Explanation:

string-to-codepoints($str) produces a sequence of code-points (think of them as "character codes") representing each character of the string.

For example;

string-to-codepoints('ABC')

produces the sequence:

65 66 67

codepoints-to-string($code-seq)

is the inverse function of string-to-codepoints(). Given a sequence of codepoints, it produces the string, whose characters are represented by the codepoints in the sequence. Thus:

codepoints-to-string((65,66,67))

produces the string:

ABC

Therefore:

for $c in string-to-codepoints($str)
 return
    codepoints-to-string($c)

gets the codepoint of each individual character in $str and converts it to a separate string.

Using string-join() we then join all such separate strings using the provided join-character "/".

like image 63
Dimitre Novatchev Avatar answered Nov 09 '22 05:11

Dimitre Novatchev


Use this line:

replace(replace($input, "(.)", "$1/", "s"), "(.*).$", "$1", "s")

Where $input points at your original string. The return of this line is your desired string.

a/b/c/d/e/f
like image 27
FailedDev Avatar answered Nov 09 '22 05:11

FailedDev