Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XDocument: saving XML to file without BOM

I'm generating an utf-8 XML file using XDocument.

XDocument xml_document = new XDocument(                     new XDeclaration("1.0", "utf-8", null),                     new XElement(ROOT_NAME,                                         new XAttribute("note", note)                 )             ); ... xml_document.Save(@file_path); 

The file is generated correctly and validated with an xsd file with success.

When I try to upload the XML file to an online service, the service says that my file is wrong at line 1; I have discovered that the problem is caused by the BOM on the first bytes of the file.

Do you know why the BOM is appended to the file and how can I save the file without it?

As stated in Byte order mark Wikipedia article:

While Unicode standard allows BOM in UTF-8 it does not require or recommend it. Byte order has no meaning in UTF-8 so a BOM only serves to identify a text stream or file as UTF-8 or that it was converted from another format that has a BOM

Is it an XDocument problem or should I contact the guys of the online service provider to ask for a parser upgrade?

like image 735
systempuntoout Avatar asked Feb 09 '11 08:02

systempuntoout


People also ask

What is BOM in XML?

The Byte-Order-Mark (or BOM), is a special marker added at the very beginning of an Unicode file encoded in UTF-8, UTF-16 or UTF-32. It is used to indicate whether the file uses the big-endian or little-endian byte order. The BOM is mandatory for UTF-16 and UTF-32, but it is optional for UTF-8.

Is BOM allowed in XML?

The BOM character may be used to indicate which of the several Unicode representations the text is encoded in. An XML document is not required to have a BOM, but if it does it should occur at the beginning of the file.


2 Answers

Use an XmlTextWriter and pass that to the XDocument's Save() method, that way you can have more control over the type of encoding used:

var doc = new XDocument(     new XDeclaration("1.0", "utf-8", null),     new XElement("root", new XAttribute("note", "boogers")) ); using (var writer = new XmlTextWriter(".\\boogers.xml", new UTF8Encoding(false))) {     doc.Save(writer); } 

The UTF8Encoding class constructor has an overload that specifies whether or not to use the BOM (Byte Order Mark) with a boolean value, in your case false.

The result of this code was verified using Notepad++ to inspect the file's encoding.

like image 112
Quick Joe Smith Avatar answered Oct 07 '22 17:10

Quick Joe Smith


First of all: the service provider MUST handle it, according to XML spec, which states that BOM may be present in case of UTF-8 representation.

You can force to save your XML without BOM like this:

XmlWriterSettings settings = new XmlWriterSettings(); settings.Encoding = new UTF8Encoding(false); // The false means, do not emit the BOM. using (XmlWriter w = XmlWriter.Create("my.xml", settings)) {     doc.Save(w); } 

(Googled from here: http://social.msdn.microsoft.com/Forums/en/xmlandnetfx/thread/ccc08c65-01d7-43c6-adf3-1fc70fdb026a)

like image 39
Dercsár Avatar answered Oct 07 '22 18:10

Dercsár