I have developed a small C#
script that opens an XLS
file, parses it and creates a list of XML
files validating them against an XSD
file.
I've tried to upload these validated files to a third-party online service (the same company that gave me the documentation/xsd stuff) and one generated file is not being accepted because NOT VALID.
The file is not accepted because it has a space at the beginning of a decimal value in a node attribute; removing this space fixes the problem.
I have created a simple test case where XDocument Validate method validates the XML with the extra-space without any problem.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml.Schema;
using System.Xml.Linq;
using System.Xml;
using System.IO;
namespace TestParser {
class Program {
static void Main(string[] args) {
string xsdMarkup =
@"<xs:schema xmlns:xs='http://www.w3.org/2001/XMLSchema'>
<xs:element name='option'>
<xs:complexType>
<xs:simpleContent>
<xs:extension base='xs:string'>
<xs:attribute name='value' type='xs:decimal'>
</xs:attribute>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
</xs:schema>";
XmlSchemaSet schemas = new XmlSchemaSet();
schemas.Add("", XmlReader.Create(new StringReader(xsdMarkup)));
XDocument doc1 = new XDocument(
new XElement("option","test", new XAttribute("value", " 423423")
));
Console.WriteLine("Validating doc1");
bool errors = false;
doc1.Validate(schemas, (o, e) =>
{
Console.WriteLine("{0}", e.Message);
errors = true;
}, true);
Console.WriteLine("doc1 {0}", errors ? "not valid" : "validated");
Console.WriteLine();
Console.WriteLine("Contents of doc1:");
Console.WriteLine(doc1);
}
}
}
The result is this:
Validating doc1
doc1 validated
Contents of doc1:
<option value=" 423423">test</option>
Is it correct that the C# XML Parser validates this XML?
Is it possible to force the Parser to be more picky about this formatting?
If I'm reading the XML spec correctly, leading whitespaces in attribute values are to be trimmed (as the .NET XML parser does):
http://www.w3.org/TR/REC-xml/#AVNormalize
"If the attribute type is not CDATA, then the XML processor MUST further process the normalized attribute value by discarding any leading and trailing space (#x20) characters [...]"
xs:decimal
is an XML Schema type (not a DTD type) and the relevant part of the XML Schema spec is how whitespace applies to xs:decimal:
whiteSpace is applicable to all ·atomic· and ·list· datatypes. For all ·atomic· datatypes other than string (and types ·derived· by ·restriction· from it) the value of whiteSpace is collapse and cannot be changed by a schema author
xs:decimal
is not derived from xs:string
, so the whitespace should be allowed and ignored. "Collapse" means to trim leading and trailing whitespace and to collapse internal runs into single space characters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With