Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Prevent XSLT transform from converting utf-8 XML into utf-16?

In Delphi XE2, I'm doing a xslt transform on a received XML file to remove all namespace information.
Problem: It changes

<?xml version="1.0" encoding="utf-8"?>

into

<?xml version="1.0" encoding="utf-16"?>

This is the XML that I get back from Exchange server:

<?xml version="1.0" encoding="utf-8"?>
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<s:Header>
<h:ServerVersionInfo MajorVersion="14" MinorVersion="0" MajorBuildNumber="722" MinorBuildNumber="0" Version="Exchange2010" xmlns:h="http://schemas.microsoft.com/exchange/services/2006/types" xmlns="http://schemas.microsoft.com/exchange/services/2006/types" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>
</s:Header>
<s:Body xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<m:ResolveNamesResponse xmlns:m="http://schemas.microsoft.com/exchange/services/2006/messages" xmlns:t="http://schemas.microsoft.com/exchange/services/2006/types">
<m:ResponseMessages>
<m:ResolveNamesResponseMessage ResponseClass="Success">
<m:ResponseCode>NoError</m:ResponseCode>
<m:ResolutionSet TotalItemsInView="1" IncludesLastItemInRange="true">
<t:Resolution>
<t:Mailbox>
<t:Name>developer</t:Name>
<t:EmailAddress>[email protected]</t:EmailAddress>
<t:RoutingType>SMTP</t:RoutingType>
<t:MailboxType>Mailbox</t:MailboxType>
</t:Mailbox>
<t:Contact>
<t:Culture>nl-NL</t:Culture>
<t:DisplayName>developer</t:DisplayName>
<t:GivenName>developer</t:GivenName>
<t:EmailAddresses>
<t:Entry Key="EmailAddress1">SMTP:[email protected]</t:Entry>
</t:EmailAddresses>
<t:ContactSource>ActiveDirectory</t:ContactSource>
</t:Contact>
</t:Resolution>
</m:ResolutionSet>
</m:ResolveNamesResponseMessage>
</m:ResponseMessages>
</m:ResolveNamesResponse>
</s:Body>
</s:Envelope>

This is the function that removes the namespace info:

Uses
   MSXML2_TLB; // IXMLDOMdocument

class function TXMLHelper.RemoveNameSpaces(XMLString: String): String;
const
  // An XSLT script for removing the namespaces from any document.
  // From http://wiki.tei-c.org/index.php/Remove-Namespaces.xsl
  cRemoveNSTransform =
    '<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">' +
    '<xsl:output method="xml" indent="no"/>' +

    '<xsl:template match="/|comment()|processing-instruction()">' +
    '    <xsl:copy>' +
    '      <xsl:apply-templates/>' +
    '    </xsl:copy>' +
    '</xsl:template>' +

    '<xsl:template match="*">' +
    '    <xsl:element name="{local-name()}">' +
    '      <xsl:apply-templates select="@*|node()"/>' +
    '    </xsl:element>' +
    '</xsl:template>' +

    '<xsl:template match="@*">' +
    '    <xsl:attribute name="{local-name()}">' +
    '      <xsl:value-of select="."/>' +
    '    </xsl:attribute>' +
    '</xsl:template>' +

    '</xsl:stylesheet>';

var
  Doc, XSL: IXMLDOMdocument2;
begin
  Doc := ComsDOMDocument.Create;
  Doc.ASync := false;
  XSL := ComsDOMDocument.Create;
  XSL.ASync := false;
  try
     Doc.loadXML(XMLString);
     XSL.loadXML(cRemoveNSTransform);
     Result := Doc.TransFormNode(XSL);
  except
     on E:Exception do Result := E.Message;
  end;
end; { RemoveNameSpaces }

But after this, it's suddenly a utf-16 document:

<?xml version="1.0" encoding="UTF-16"?>
<Envelope>
[snip]
</Envelope>

After Googling "xsl utf-8 utf-16" I tried several things:

  • Change the line (e.g. Output DataTable XML in UTF8 rather than UTF16)

    <xsl:output method="xml" indent="no">
    

    into either:

    <xsl:output method="xml" encoding="utf-8" indent="no"/>
    <xsl:output method="xml" encoding="utf-8"/>
    <xsl:output encoding="utf-8"/>
    

    That did not work.
    (It would be the optimal solution, according to http://www.xml.com/pub/a/2002/09/04/xslt.html "The encoding attribute actually does more than add an encoding declaration to the result document; it tells the XSLT processor to write out the result using that encoding.")

  • Change the line (e.g. XslCompiledTransform uses UTF-16 encoding)

    <xsl:output method="xml" indent="no"/>
    

    into

    <xsl:output method="xml" omit-xml-declaration="yes" indent="no" />
    

    which leaves out the starting xml tag, but if I then just prepend

    <?xml version="1.0" encoding="utf-8"?>
    

    I will lose characters because no actual utf conversion is done.

  • IXMLDOMdocument2 does not have an Encoding property

Any ideas how to fix this?

Remarks/background:

  • If all else fails there's maybe still the option to change the utf-16 XML data to utf-8, but that's an entirely different approach.

  • I'm trying to do everything utf-8 because I'm communicating with Exchange server through EWS, and setting the http request header to utf-16 does not work: Exchange tells me that the content-type 'text/xml; charset = utf-16' is not the expected type 'text/xml; charset = utf-8'. EWS returns utf-8 (see start of post).

like image 811
Jan Doggen Avatar asked Apr 18 '13 08:04

Jan Doggen


People also ask

How does XSLT transform XML?

XSLT is used to transform XML document from one form to another form. XSLT uses Xpath to perform matching of nodes to perform these transformation . The result of applying XSLT to XML document could be an another XML document, HTML, text or any another document from technology perspective.

Is there any benefit of converting XML to XSLT?

XSLT is commonly used to convert XML to HTML, but can also be used to transform XML documents that comply with one XML schema into documents that comply with another schema. XSLT can also be used to convert XML data into unrelated formats, like comma-delimited text or formatting languages such as troff.

What are the output formats for XSLT?

XSLT uses the <xsl:output> element to determine whether the output produced by the transformation is conformant XML (<xsl:output method="xml"/> ), valid HTML (<xsl:output method="html"/> ), or unverified text (< xsl:output method="text"/> ).


2 Answers

The problem is the use of the transformNode method, it returns a string and with MSXML such a string is UTF-16 encoded. So you need to create an empty MSXML DOM document for the result and use the transformNodeToObject method, passing the empty DOM document as the second argument, then you can save the result document to a file or stream and the encoding should be as specified in the xsl:output directive.

like image 161
Martin Honnen Avatar answered Nov 07 '22 17:11

Martin Honnen


To use IXMLDocument in you original code, it should look like this:

var
  iInp, iOtp, iXsl: IXMLDocument;
  Utf8: UTF8String;
begin
  iInp := LoadXMLData(XMLString);
  iXsl := LoadXMLData(cRemoveNSTransfrom);
  iOtp := NewXMLDocument;
  iInp.Node.TransformNode(iXsl.Node,iOtp);
  iOtp.SaveToXML(Utf8);
end

Now the variable Utf8 should contain transformed XML in UTF-8 encoding, If you want save to stream/file, replace SaveToXML by

  iOtp.Encoding := 'UTF-8';
  iOtp.SaveToFile(....);
like image 1
pf1957 Avatar answered Nov 07 '22 15:11

pf1957