Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

illegal xml character on SQL Insert

I am trying to insert the following string into an sql xml field

<?xml version="1.0" encoding="UTF-8"?>
<Response> 
    <Ip>x.x.x.x</Ip>
    <CountryCode>CA</CountryCode> 
    <CountryName>Canada</CountryName>
    <RegionCode>QC</RegionCode> 
    <RegionName>Québec</RegionName> 
    <City>Dorval</City> 
    <ZipCode>h9p1j3</ZipCode> 
    <Latitude>45.45000076293945</Latitude> 
    <Longitude>-73.75</Longitude> 
    <MetroCode></MetroCode> 
    <AreaCode></AreaCode> 
</Response>

The insert code looks like:

INSERT 
    INTO Traffic(... , xmlGeoLocation, ...)
    VALUES (
        ...
        <!--- 
        <cfqueryparam CFSQLType="cf_sql_varchar" value="#xmlGeoLocation#">, 
        --->
        '#xmlGeoLocation#',

        ...
        )

Two bad things happen:

  1. Québec gets turned into Québec

  2. I get an error saying [Macromedia][SQLServer JDBC Driver][SQLServer]XML parsing: line 8, character 16, illegal xml character

UPDATE:

The incoming test stream is mostly single byte characters.

The é is a two byte character. In particular C3A9

Also I don't have control over the incoming xml stream

like image 925
James A Mohler Avatar asked Apr 14 '13 03:04

James A Mohler


2 Answers

I'm going to strip the header...

I'm having the same issue with a funny little apostrophe thing. I think the issue is that by the time the string is getting converted to XML, it's not UTF-8 anymore, but sql server is trying to use the header to decode it. If it's VARCHAR, it's in the client's encoding. If it's NVARCHAR, it's UTF-16. Here are some variations I tested:

SQL (varchar, UTF-8):

SELECT CONVERT(XML,'<?xml version="1.0" encoding="UTF-8"?><t>We’re sorry</t>')

Error:

XML parsing: line 1, character 44, illegal xml character

SQL (nvarchar, UTF-8):

SELECT CONVERT(XML,N'<?xml version="1.0" encoding="UTF-8"?><t>We’re sorry</t>')

Error: XML parsing: line 1, character 38, unable to switch the encoding

SQL (varchar, UTF-16)

SELECT CONVERT(XML,'<?xml version="1.0" encoding="UTF-16"?><t>We’re sorry</t>')

Error:

XML parsing: line 1, character 39, unable to switch the encoding

SQL (nvarchar, UTF-16)

SELECT CONVERT(XML,N'<?xml version="1.0" encoding="UTF-16"?><t>We’re sorry</t>')

Worked!

like image 66
J. Holladay Avatar answered Sep 21 '22 02:09

J. Holladay


Have a look at this link from w3, it tells me that:

In HTML, there is a list of some built-in character names like &eacute; for é but XML does not have this. In XML, there are only five built-in character entities: &lt;, &gt;, &amp;, &quot; and &apos; for <, >, &, " and ' respectively. You can define your own entities in a Document Type Definition, or you can use any Unicode character (see next item).

In HTML, there are also numeric character references, such as &#38; for &. You can refer to any Unicode character, but the number is decimal, whereas in the Unicode tables the number is usually in hexadecimal. XML also allows hexadecimal references: &#x26; for example.

This leads me to believe that, &#xE9; might work for an é character.

Also the information at this link from Microsoft states that:

SQLXML 4.0 relies upon the limited support for DTDs provided in SQL Server. SQL Server allows for an internal DTD in xml data type data, which can be used to supply default values and to replace entity references with their expanded contents. SQLXML passes the XML data "as is" (including the internal DTD) to the server. You can convert DTDs to XML Schema (XSD) documents using third-party tools, and load the data with inline XSD schemas into the database.

But all this does not help you if you don't have control over the incoming XML stream. I doubt that it is possible to save an é (or any special character for that matter, except for the built in character entities mentioned above) inside an XML document into an SQL Server XML field, without either adding a DTD or replacing the character with its hexadecimal reference counterpart. In both cases you would need to be able to modify the XML before it goes into the database.

Just a quick example for anyone wanting to go down the "adding a DTD" route.

Here's how to add an internal DTD to an xml file which declares an entity for an é character:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root [<!ENTITY eacute "&#233;">]>
<root>
  <RegionName>Qu&eacute;bec</RegionName>
</root>

If you go here and search on the page "Ctrl+F" for "eacute", you end up in a list with examples for other characters which you could just copy and paste into your own internal DTD.

Edit

You could off course add all entities as they are specified at the link above: <!ENTITY eacute "&#233;"><!ENTITY .. // Next entity>, or just copy them all from this file. I do understand how adding an internal DTD to every single XML file you add to the database isn't such a good idea. I would be interested to know if adding it for 1 file fixes your issue though.

like image 31
tom-19 Avatar answered Sep 25 '22 02:09

tom-19