I have a few strings containing a variant of Hexadecimal strings (source is framemaker if one would care). Strings could therefore look like
this is some sentence with some hex code\x27 s , and we need that fixed.
and will need to be changed to
this is some sentence with some hex code's , and we need that fixed.
In reality there can be a few of these in a single string, so I'm looking on the best way to walk through the text, capture all hex codes (looking like \x## ) and replace all of these codes with the correct character. I have made a xml list / lookup table containing all the characters as follows :
<xsl:param name="reflist">
    <Code Value="\x27">'</Code>
<Code Value="\x28">(</Code>
<Code Value="\x29">)</Code>
<Code Value="\x2a">*</Code>
<Code Value="\x2b">+</Code>
    <!-- much more like these... -->
</xsl:param>
For now I used a simple replace argument but there are simply too many characters to make this workable.
What's the best way to do this?
One can completely avoid using any "reference table" -- like this:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:xs="http://www.w3.org/2001/XMLSchema"
 xmlns:my="my:my" exclude-result-prefixes="my xs">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>
 <xsl:template match="text()[matches(.,  '\\x(\d|[a-f])+')]">
   <xsl:analyze-string select="." regex="\\x(\d|[a-f])+" >
     <xsl:matching-substring>
       <xsl:value-of select=
       "codepoints-to-string(my:hex2dec(substring(.,3), 0))"/>
     </xsl:matching-substring>
     <xsl:non-matching-substring>
      <xsl:value-of select="."/>
     </xsl:non-matching-substring>
   </xsl:analyze-string>
 </xsl:template>
 <xsl:function name="my:hex2dec" as="xs:integer">
  <xsl:param name="pStr" as="xs:string"/>
  <xsl:param name="pAccum" as="xs:integer"/>
  <xsl:sequence select=
   "if(not($pStr))
     then $pAccum
     else
      for $char in substring($pStr, 1, 1),
          $code in
            if($char ge '0' and $char le '9')
              then xs:integer($char)
              else
                string-to-codepoints($char) - string-to-codepoints('a') +10
       return
          my:hex2dec(substring($pStr,2), 16*$pAccum + $code)
   "/>
 </xsl:function>
</xsl:stylesheet>
When this transformation is applied on the following XML document:
<t>
 <p>this is some sentence with some hex code\x27 s ,
    and we need that fixed.</p>
 <p>this is some sentence with some hex code\x28 s ,
    and we need that fixed.</p>
 <p>this is some sentence with some hex code\x29 s ,
    and we need that fixed.</p>
 <p>this is some sentence with some hex code\x2a s ,
    and we need that fixed.</p>
 <p>this is some sentence with some hex code\x2b s ,
    and we need that fixed.</p>
 <p>this is some sentence with some hex code\x2c s ,
    and we need that fixed.</p>
 <p>this is some sentence with some hex code\x2d s ,
    and we need that fixed.</p>
 <p>this is some sentence with some hex code\x2e s ,
    and we need that fixed.</p>
 <p>this is some sentence with some hex code\x2f s ,
    and we need that fixed.</p>
</t>
the wanted, correct result is produced:
<t>
   <p>this is some sentence with some hex code' s ,
    and we need that fixed.</p>
   <p>this is some sentence with some hex code( s ,
    and we need that fixed.</p>
   <p>this is some sentence with some hex code) s ,
    and we need that fixed.</p>
   <p>this is some sentence with some hex code* s ,
    and we need that fixed.</p>
   <p>this is some sentence with some hex code+ s ,
    and we need that fixed.</p>
   <p>this is some sentence with some hex code, s ,
    and we need that fixed.</p>
   <p>this is some sentence with some hex code- s ,
    and we need that fixed.</p>
   <p>this is some sentence with some hex code. s ,
    and we need that fixed.</p>
   <p>this is some sentence with some hex code/ s ,
    and we need that fixed.</p>
</t>
Do note:
This transformation is generic and can correctly process any hexadecimal unicode code.
For example, if the same transformation is applied on this XML document:
<t>
 <p>this is some sentence with some hex code\x0428\x0438\x0448 s ,
    and we need that fixed.</p>
</t>
the correct result (containing the Bulgarian word for "grill" in Cyrillic) is produced:
<t>
   <p>this is some sentence with some hex codeШиш s ,
    and we need that fixed.</p>
</t>
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With