Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make TXMLDocument (with the MSXML Implementation) always include the encoding attribute?

I have legacy code (I didn't write it) that always included the encoding attribute, but recompiling it to D2010, TXMLDocument doesn't include the encoding anymore. Because the XML data have accented characters both on tags and data, TXMLDocument.LoadFromFile simply throws EDOMParseErros saying that an invalid character is found on the file. Relevant code:

   Doc := TXMLDocument.Create(nil);  
   try
     Doc.Active := True;
     Doc.Encoding := XMLEncoding;
     RootNode := Doc.CreateElement('Test', '');
     Doc.DocumentElement := RootNode;
     <snip>
     //Result := Doc.XMl.Text;
     Doc.SaveToXML(Result);    // Both lines gives the same result

On older versions of Delphi, the following line is generated:

<?xml version="1.0" encoding="ISO-8859-1"?>

On D2010, this is generated:

<?xml version="1.0"?>

If I change manually the line, all works like always worked in the last years.

UPDATE: XMLEncoding is a constant and is defined as follow

  XMLEncoding = 'ISO-8859-1';
like image 230
Fabricio Araujo Avatar asked May 03 '10 17:05

Fabricio Araujo


2 Answers

You'll want to see IXMLDocument.CreateProcessingStruction. I use OmniXML, but it's syntax is similar and should get you started:

var
  FDoc: IXMLDocument;
  PI:  IXMLProcessingInstruction;
begin
  FDoc := OmniXML.CreateXMLDoc();
  PI := FDoc.CreateProcessingInstruction('xml', 'version="1.0" encoding="UTF-8"');
  FDoc.AppendChild(PI);
end;
like image 196
Ken White Avatar answered Nov 09 '22 00:11

Ken White


var 
  XMLStream: TStringStream;
begin  
   Doc := TXMLDocument.Create(nil);  
   try
     Doc.Active := True;
     Doc.Encoding := XMLEncoding;
     RootNode := Doc.CreateElement('Test', '');
     Doc.DocumentElement := RootNode;
     <snip>
     XMLStream := TStringStream.Create;
     Doc.SaveToStream(XMLStream);
     Result := XmlStream.DataString;
     XMLStream.Free;

Since Ken's answer and the link to MSXML article, I decided to investigate the XML property and SaveToXML method. Both use the XML property of the MSXMLDOM implementation - which in the article is said that do not bring the encoding when directly read ( in the "Creating New XML Documents with MSXML" section right after the use of CreateProcessInstruction method).

UPDATE:

I found that accented characters are getting truncated in the resulting XML. When the processor of that XML started to throw strange errors, we saw that the chars are being converted to the numeric char constant ( #13 is the numeric char constant for carriage return). So, I used a TStringStream to get it FINALLY right.

like image 33
Fabricio Araujo Avatar answered Nov 08 '22 23:11

Fabricio Araujo