Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XDocument.Validate not catching all errors against XSD

Tags:

c#

xml

xsd

I have a really strange problem validating an XML document against a valid XSD using C# XDocument.Validate or XMLReaderSettings with required configurations. The problem is: When there are errors in the XML document, the validation process fails to catch all errors under certain conditions and I can't find a pattern for this anomoly.

Here is my XSD:

<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified"
			  targetNamespace="http://www.somesite.com/somefolder/messages"
			  xmlns:xs="http://www.w3.org/2001/XMLSchema">
   <xs:element name="Message">
    <xs:complexType>
     <xs:sequence>
      <xs:element name="Header">
         <xs:complexType>
          <xs:sequence>
           <xs:element name="MessageId" type="xs:string" />
           <xs:element name="MessageSource" type="xs:string" />
          </xs:sequence>
       </xs:complexType>
    </xs:element>
    <xs:element name="Body">
       <xs:complexType>
          <xs:sequence>
             <xs:element name="Abc001">
                <xs:complexType>
                   <xs:sequence>
                    <xs:element name="Abc002" type="xs:string" />
                    <xs:element name="Abc003" type="xs:string" minOccurs="0" />
                    <!--<xs:element name="Abc004" type="xs:string" />-->
                    <xs:element name="Abc004">
                       <xs:simpleType>
                         <xs:restriction base="xs:string">
                           <xs:maxLength value="200"/>
                         </xs:restriction>
                      </xs:simpleType>
                    </xs:element>
                      <xs:element name="Abc005">
                         <xs:complexType>
                            <xs:sequence>
                              <xs:element name="Abc006" type="xs:unsignedShort" />
                              <xs:element name="Abc007">
                                <xs:complexType>
                                  <xs:sequence>
                                    <xs:element name="Abc008" type="xs:string"/>
                                    <xs:element name="Abc009" type="xs:string" minOccurs="0"/>
                                    <xs:element name="Abc010" type="xs:string"/>
                                  </xs:sequence>
                                </xs:complexType>
                              </xs:element>
                              <xs:element name="Abc011" type="xs:date" />
                              <xs:element name="Abc012">
                                <xs:complexType>
                                  <xs:sequence>
                                    <xs:element name="Abc013" type="xs:string" />
                                    <xs:element name="Abc014" type="xs:string" />
                                  </xs:sequence>
                                </xs:complexType>
                              </xs:element>
                            </xs:sequence>
                         </xs:complexType>
                      </xs:element>
                   </xs:sequence>
                </xs:complexType>
             </xs:element>
          </xs:sequence>
       </xs:complexType>
    </xs:element>
   </xs:sequence>
  </xs:complexType>
 </xs:element>
</xs:schema>

And here is the XML document being validated against this XSD:

<?xml version="1.0" encoding="utf-8"?>
<Message xmlns="http://www.somesite.com/somefolder/messages">
	<Header>
		<MessageId>Lorem</MessageId>
		<MessageSource>Ipsum</MessageSource>
	</Header>
	<Body>
		<Abc001>
			<Abc002>dolor</Abc002>
			<Abc003>sit amet</Abc003>
			<Abc004>consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.</Abc004>
			<Abc005>
				<Abc006>1234</Abc006>
				<Abc007>
					<Abc008>Ut enim</Abc008>
					<Abc009>ad</Abc009>
					<Abc010>minim</Abc010>
				</Abc007>
				<Abc011>1982-10-17</Abc011>
				<Abc012>
					<Abc013>veniam</Abc013>
					<Abc014>nostrud</Abc014>
				</Abc012>
			</Abc005>
		</Abc001>
	</Body>
</Message>

Now, when I introduce some validation errors into the XML and validate it against the XSD, it does find all the errors as expected. Here is the error-prone xml (I have marked where the errors are introduced):

<?xml version="1.0" encoding="utf-8"?>
<Message xmlns="http://www.somesite.com/somefolder/messages">
	<Header>
		<MessageId>Lorem</MessageId>
		<MessageSource>Ipsum</MessageSource>
	</Header>
	<Body>
		<Abc001>
			<Abc002>dolor</Abc002>
			<Abc003>sit amet</Abc003>
			
			<!--The value for Abc004 is increased beyond the allowed 200 characters-->
			
			<Abc004>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</Abc004>
			<Abc005>
				<Abc006>1234</Abc006>
				<Abc007>
					<Abc008>Ut enim</Abc008>
					<ABC009>AD</ABC009>
					
					<!--<Abc010>minim</Abc010>  Required element removed-->
				</Abc007>
				
				<!--Date formate below is wrong-->
				<Abc011>1982-10-37</Abc011>
				
				<Abc012>
					<Abc013>veniam</Abc013>
					<Abc014>nostrud</Abc014>
				</Abc012>
			</Abc005>

			<!--the element below is not allowed-->
			<Abc15>Not allowed</Abc15>
		</Abc001>
	</Body>
</Message>

and here is my resulting xml that shows all the errors:

<MessageResponse xmlns="http://www.somesite.com/somefolder/messages">
    <Result>false</Result>
    <Status>Failed</Status>
    <FaultCount>4</FaultCount>
    <Faults>
        <Fault>
            <FaultCode>ERR01</FaultCode>
            <FaultMessage>The 'http://www.somesite.com/somefolder/messages:Abc004' element is invalid - The value 'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.' is invalid according to its datatype 'String' - The actual length is greater than the MaxLength value.</FaultMessage>
        </Fault>
        <Fault>
            <FaultCode>ERR02</FaultCode>
            <FaultMessage>The element 'Abc007' in namespace 'http://www.somesite.com/somefolder/messages' has invalid child element 'ABC009' in namespace 'http://www.somesite.com/somefolder/messages'. List of possible elements expected: 'Abc009, Abc010' in namespace 'http://www.somesite.com/somefolder/messages'.</FaultMessage>
        </Fault>
        <Fault>
            <FaultCode>ERR03</FaultCode>
            <FaultMessage>The 'http://www.somesite.com/somefolder/messages:Abc011' element is invalid - The value '1982-10-37' is invalid according to its datatype 'http://www.w3.org/2001/XMLSchema:date' - The string '1982-10-37' is not a valid Date value.</FaultMessage>
        </Fault>
        <Fault>
            <FaultCode>ERR04</FaultCode>
            <FaultMessage>The element 'Abc001' in namespace 'http://www.somesite.com/somefolder/messages' has invalid child element 'Abc15' in namespace 'http://www.somesite.com/somefolder/messages'.</FaultMessage>
        </Fault>
    </Faults>
</MessageResponse>

Here is the weird part. When I introduce one more error towards the beginning of the "Abc001" element, and also keep all the other existing errors, the result is totally messed up. Here is the XML with the newly introduced error:

<?xml version="1.0" encoding="utf-8"?>
<Message xmlns="http://www.somesite.com/somefolder/messages">
	<Header>
		<MessageId>Lorem</MessageId>
		<MessageSource>Ipsum</MessageSource>
	</Header>
	<Body>
		<Abc001>
			<!--newly introduced error - removed the following element-->
			<!--<Abc002>dolor</Abc002>-->
			<Abc003>sit amet</Abc003>
			<!--The value for Abc004 is increased beyond the allowed 200 characters-->
			<Abc004>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</Abc004>
			<Abc005>
				<Abc006>1234</Abc006>
				<Abc007>
					<Abc008>Ut enim</Abc008>
					<ABC009>AD</ABC009>
					<!--<Abc010>minim</Abc010>-->
				</Abc007>
				<Abc011>1982-10-37</Abc011>
				<Abc012>
					<Abc013>veniam</Abc013>
					<Abc014>nostrud</Abc014>
				</Abc012>
			</Abc005>
			<!--the element below is not allowed-->
			<Abc15>Not allowed</Abc15>
		</Abc001>
	</Body>
</Message>

and finally, here is the validation result:

<MessageResponse xmlns="http://www.somesite.com/somefolder/messages">
    <Result>false</Result>
    <Status>Failed</Status>
    <FaultCount>1</FaultCount>
    <Faults>
        <Fault>
            <FaultCode>ERR01</FaultCode>
            <FaultMessage>The element 'Abc001' in namespace 'http://www.somesite.com/somefolder/messages' has invalid child element 'Abc003' in namespace 'http://www.somesite.com/somefolder/messages'. List of possible elements expected: 'Abc002' in namespace 'http://www.somesite.com/somefolder/messages'.</FaultMessage>
        </Fault>
    </Faults>
</MessageResponse>

Here is my C# code I am using to validate:

public async Task<IMIDPreValidationAckMessage> ValidateXmlMessage( XDocument doc )
    {
        var result = new PreValidationAckMessage();
        result.Result = true;
        result.Status = "Succeeded";

        var xsd = HttpContext.Current.Server.MapPath( "~/message01.xsd" );

        try
        {
            var uri = new System.Uri(xsd);

            var localPath = uri.LocalPath;

            var docNameSpace = doc.Root.Name.Namespace.NamespaceName;

            XmlSchemaSet schemas = new XmlSchemaSet();
            schemas.Add( docNameSpace, localPath );

            XmlReaderSettings xrs = new XmlReaderSettings();
            xrs.ValidationType = ValidationType.Schema;
            xrs.ValidationFlags |= XmlSchemaValidationFlags.ReportValidationWarnings;
            xrs.Schemas = schemas;

            result.XSDNamespace = doc.Root.GetDefaultNamespace().NamespaceName;
            var errCode = 1;

            xrs.ValidationEventHandler += ( s, e ) =>
            {
                var msg = e.Message;
                result.Result = false;
                result.Status = "Failed";
                result.FaultCount++;
                result.Faults.Add( new Fault
                {
                    FaultCode = "ERR" + errCode++.ToString().PadLeft( 2, '0' ),
                    FaultMessage = e.Message
                } );
            };

            using ( XmlReader xr = XmlReader.Create( doc.CreateReader(), xrs ) )
            {
                while ( xr.Read() ) { }
            }
        }
        catch ( System.Exception ex )
        {
            result.Result = false;
            result.Status = "Unknown Error";
        }
        return result;
    }

Can someone please tell me what is wrong here?

like image 249
Babu Mannavalappil Avatar asked Jun 08 '26 17:06

Babu Mannavalappil


1 Answers

It seems that XmlReader stops validation of element on first encountered error. Here is a link to description of old (obsolete) XmlValidatingReader ValidationEventHandler:

If an element reports a validation error, the rest of the content model for that element is not validated, however, its children are validated. The reader only reports the first error for a given element.

And it seems it is the same with regular XmlReader (though its documentation does not mention it explicitly).

In first examples errors are either in innermost elements (such as invalid text value of element) or at the last child element, so they are all reported and nothing skipped. However in last example you introduce error at the beginning of root Abc001 element, so the rest of Abc001 content is skipped, together with all errors.

like image 179
Evk Avatar answered Jun 11 '26 05:06

Evk