Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XML Encoding in XMLTYPE column in Oracle DB

I have a table created like this:

create table b (data timestamp, value XMLTYPE);

I run this script in TOAD 12.6 to store a XML in the table.

DECLARE
    lc_Soap         CLOB;
    lc_Request      CLOB;
    px_RequestXML   XMLTYPE
        := XMLTYPE ('<test><test1>ABDDÇJJSõ</test1></test>');
BEGIN
    DELETE b;

    lc_Soap :=
        '<?xml version="1.0" encoding="ISO-8859-1"?>
               <s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
                  <s:Header>
                      <h:AxisValues xmlns="urn:/microsoft/multichannelframework/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:h="urn:/microsoft/multichannelframework/">
                          <User xmlns="">TEST</User>
                      </h:AxisValues>
                  </s:Header>
                  <s:Body xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
                      <substr/>
                  </s:Body>
              </s:Envelope>';

    lc_Request :=
        pkg_utils.replace_clob (lc_Soap,
                                '<substr/>',
                                xml_utils.XMLTypeToClob (px_RequestXML));

    px_RequestXML := XMLTYPE.createXML (lc_Request);

    INSERT INTO b
         VALUES (SYSTIMESTAMP, px_RequestXML);

    COMMIT;
END;

When I try to see what is in the VALUE column I get this encoding UTF-8

<?xml version="1.0" encoding="UTF-8"?>
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
  <s:Header>
    <h:AxisValues xmlns="urn:/microsoft/multichannelframework/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:h="urn:/microsoft/multichannelframework/">
      <User xmlns="">TEST</User>
    </h:AxisValues>
  </s:Header>
  <s:Body xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <test>
      <test1>ABDDÇJJSõ</test1>
    </test>
  </s:Body>
</s:Envelope>

But this script was build to run in a different DB user or a in Oracle JOB. And in that cases, the encoding is different:

<?xml version="1.0" encoding="WINDOWS-1252"?>
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
  <s:Header>
    <h:AxisValues xmlns="urn:/microsoft/multichannelframework/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:h="urn:/microsoft/multichannelframework/">
      <User xmlns="">TEST</User>
    </h:AxisValues>
  </s:Header>
  <s:Body xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <test>
      <test1>ABDDÇJJSõ</test1>
    </test>
  </s:Body>
</s:Envelope>

The NLS_CHARACTERSET parameter for DB is WE8MSWIN1252. Why this append? And Who can I always store as UTF-8?

like image 264
milheiros Avatar asked May 07 '26 04:05

milheiros


1 Answers

Oracle will use the client characterset to create a XMLTYPE from a CLOB or String and completely ignore the encoding in the XML prolog (see docs). You may set encoding="blabla" and it will work. Oracle honors the encoding in the XML prolog only when you create a XMLTYPE from a BLOB.

The client environment also drives the encoding when reading an XMLTYPE. If you want a XML document to be encoded in UTF-8 regardless of the client encoding, you have to retrieve it as BLOB.

Either via getBlobVal()

SELECT (c2).getBlobVal(nls_charset_id('UTF8')) FROM b;

or via xmlserialize()

SELECT xmlserialize(DOCUMENT c2 AS BLOB ENCODING 'UTF-8') FROM b;
like image 58
Philipp Salvisberg Avatar answered May 10 '26 00:05

Philipp Salvisberg



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!