Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XML validation: decimal attribute value starting with a space

Tags:

c#

xml

xsd

I have developed a small C# script that opens an XLS file, parses it and creates a list of XML files validating them against an XSD file.

I've tried to upload these validated files to a third-party online service (the same company that gave me the documentation/xsd stuff) and one generated file is not being accepted because NOT VALID.

The file is not accepted because it has a space at the beginning of a decimal value in a node attribute; removing this space fixes the problem.

I have created a simple test case where XDocument Validate method validates the XML with the extra-space without any problem.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml.Schema;
using System.Xml.Linq;
using System.Xml;
using System.IO;

namespace TestParser {
    class Program {
        static void Main(string[] args) {
            string xsdMarkup =
            @"<xs:schema xmlns:xs='http://www.w3.org/2001/XMLSchema'>
                <xs:element name='option'>
                    <xs:complexType>
                    <xs:simpleContent>
                        <xs:extension base='xs:string'>
                        <xs:attribute name='value' type='xs:decimal'>
                        </xs:attribute>
                        </xs:extension>
                    </xs:simpleContent>
                    </xs:complexType>
                </xs:element>
                </xs:schema>";
        XmlSchemaSet schemas = new XmlSchemaSet();
        schemas.Add("", XmlReader.Create(new StringReader(xsdMarkup)));

        XDocument doc1 = new XDocument(
            new XElement("option","test", new XAttribute("value", " 423423")
            ));    
        Console.WriteLine("Validating doc1");
        bool errors = false;
        doc1.Validate(schemas, (o, e) =>
                                    {
                                        Console.WriteLine("{0}", e.Message);
                                        errors = true;
                                    }, true);
        Console.WriteLine("doc1 {0}", errors ? "not valid" : "validated");
        Console.WriteLine();
        Console.WriteLine("Contents of doc1:");
        Console.WriteLine(doc1);
        }
    }
}

The result is this:

Validating doc1
doc1 validated

Contents of doc1:
<option value=" 423423">test</option>

Is it correct that the C# XML Parser validates this XML?
Is it possible to force the Parser to be more picky about this formatting?

like image 765
systempuntoout Avatar asked Mar 01 '11 12:03

systempuntoout


2 Answers

If I'm reading the XML spec correctly, leading whitespaces in attribute values are to be trimmed (as the .NET XML parser does):

http://www.w3.org/TR/REC-xml/#AVNormalize

"If the attribute type is not CDATA, then the XML processor MUST further process the normalized attribute value by discarding any leading and trailing space (#x20) characters [...]"

like image 104
nodots Avatar answered Sep 28 '22 03:09

nodots


xs:decimal is an XML Schema type (not a DTD type) and the relevant part of the XML Schema spec is how whitespace applies to xs:decimal:

whiteSpace is applicable to all ·atomic· and ·list· datatypes. For all ·atomic· datatypes other than string (and types ·derived· by ·restriction· from it) the value of whiteSpace is collapse and cannot be changed by a schema author

xs:decimal is not derived from xs:string, so the whitespace should be allowed and ignored. "Collapse" means to trim leading and trailing whitespace and to collapse internal runs into single space characters.

like image 45
xan Avatar answered Sep 28 '22 05:09

xan