Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parse an XSD to get the information from <xsd:simpleType> elements using C#?

Tags:

c#

c#-4.0

xsd

I have an XSD with multiple complex types and simple types (part of the file shown below). I need to parse this document to get maxLength from each of the simpletypes that are referenced in the complex types. Can anyone please throw some advice on how to implement this? I need to implement this in a generic way so if I query on "Setup_Type" it should give the below output. Thank you!

NewSetup/Amount = 12 (The name attributes from element tags separated by "/" and maxLength from the nested simpleType)

NewSetup/Name = 50

<xsd:complexType name="Setup_Type">
  <xsd:sequence>
    <xsd:element name="NewSetup" type="NewSetup_Type" minOccurs="1" maxOccurs="1" />
  </xsd:sequence>
</xsd:complexType>

<xsd:complexType name="NewSetup_Type">
  <xsd:sequence>
    <xsd:element name="Amount" type="Amount_Type"  minOccurs="1" maxOccurs="1" />
    <xsd:element name="Name" type="Name_Type"  minOccurs="1" maxOccurs="1" />
  </xsd:sequence>
</xsd:complexType>

<xsd:simpleType name="Amount_Type">
  <xsd:annotation>
    <xsd:documentation>Amount</xsd:documentation>
  </xsd:annotation>
  <xsd:restriction base="xsd:string">
    <xsd:maxLength value="12" />
  </xsd:restriction>
</xsd:simpleType>

<xsd:simpleType name="Name_Type">
  <xsd:annotation>
    <xsd:documentation>Name</xsd:documentation>
  </xsd:annotation>
  <xsd:restriction base="xsd:string">
    <xsd:maxLength value="50" />
  </xsd:restriction>
</xsd:simpleType>
like image 341
Jyina Avatar asked Jul 19 '12 21:07

Jyina


1 Answers

I have seen similar questions asked in the past (full disclosure, I've ask a similar question myself). Parsing an XSD is not for the faint of heart.

You basically have 2 options, first is easier to implement, but can be broken more easily by minor changes to the XSD. the 2nd is a more robust but hard to implement.

Option 1:

Parsing the XSD with LINQ (or other C# XML parser if you prefer). Since an XSD is just an XML, you can load it into an XDocument and just read it via LINQ.

For just a sample of your own XSD:

<xsd:simpleType name="Amount_Type">
  <xsd:annotation>
    <xsd:documentation>Amount</xsd:documentation>
  </xsd:annotation>
  <xsd:restriction base="xsd:string">
    <xsd:maxLength value="12" />
  </xsd:restriction>
</xsd:simpleType>

You can access the MaxLength:

var xDoc = XDocument.Load("your XSD path");
var ns = XNamespace.Get(@"http://www.w3.org/2001/XMLSchema");

var length = (from sType in xDoc.Element(ns + "schema").Elements(ns + "simpleType")
              where sType.Attribute("name").Value == "Amount_Type"
              from r in sType.Elements(ns + "restriction")
              select r.Element(ns + "maxLength").Attribute("value")
                      .Value).FirstOrDefault();

This does not offer a very easy method for parsing by type name, especially for extended types. To use this you need to know the exact path for each element you are looking for.

Option 2:

This is far too complex for a quick answer (note: see the edit below - I had some time and put together a working solution), so I am going to encourage you to look at my own question I linked above. In it, I linked a great blog that shows how to seriously break down the XSD into pieces and might allow you to perform the type of search you want. You have to decide if it is worth the effort to develop it (the blog shows an implementation with XmlReader that contains an XML that is validated against the XSD in question, but you can easily accomplish this by directly loading the XSD and parsing it.

2 key idea to find in the blog are:

// in the getRestriction method (reader in this context is an `XmlReader` that 
//  contains a XML that is being validated against the specific XSD
if (reader.SchemaInfo.SchemaElement == null) return null;
simpleType = reader.SchemaInfo.SchemaElement.ElementSchemaType as XmlSchemaSimpleType;
if (simpleType == null) return null;
restriction = simpleType.Content as XmlSchemaSimpleTypeRestriction;

// then in the getMaxLength method
if (restriction == null) return null;
List<int> result = new List<int>();
foreach (XmlSchemaObject facet in restriction.Facets) {
if (facet is XmlSchemaMaxLengthFacet) result.Add(int.Parse(((XmlSchemaFacet) facet).Value));

I actually tried the same thing last year to parse an XSD as part of a complicated data validation method. It took me the better part of a week to really understand what was happening an to adapt the methods in the blog to suit my purposes. It is definitely the best way to implement exactly what you want.

If you want to try this with a standalone schema, you can load the XSD into an XmlSchemaSet object, then use the GlobalTypes property to help you find the specific type you are looking for.


EDIT: I pulled up my old code and started putting together the code to help you.

First to load your schema:

XmlSchemaSet set; // this needs to be accessible to the methods below,
                  //  so should be a class level field or property

using (var fs = new FileStream(@"your path here", FileMode.Open)
{
    var schema = XmlSchema.Read(fs, null);

    set = new XmlSchemaSet();
    set.Add(schema);
    set.Compile();
}

The following methods should give you close to what you want based on the XSD you provided. It should be pretty adaptable to deal with more complex structures.

public Dictionary<string, int> GetElementMaxLength(String xsdElementName)
{
    if (xsdElementName == null) throw new ArgumentException();
    // if your XSD has a target namespace, you need to replace null with the namespace name
    var qname = new XmlQualifiedName(xsdElementName, null);

    // find the type you want in the XmlSchemaSet    
    var parentType = set.GlobalTypes[qname];

    // call GetAllMaxLength with the parentType as parameter
    var results = GetAllMaxLength(parentType);

    return results;
}

private Dictionary<string, int> GetAllMaxLength(XmlSchemaObject obj)
{
    Dictionary<string, int> dict = new Dictionary<string, int>();

    // do some type checking on the XmlSchemaObject
    if (obj is XmlSchemaSimpleType)
    {
        // if it is a simple type, then call GetMaxLength to get the MaxLength restriction
        var st = obj as XmlSchemaSimpleType;
        dict[st.QualifiedName.Name] = GetMaxLength(st);
    }
    else if (obj is XmlSchemaComplexType)
    {

        // if obj is a complexType, cast the particle type to a sequence
        //  and iterate the sequence
        //  warning - this will fail if it is not a sequence, so you might need
        //  to make some adjustments if you have something other than a xs:sequence
        var ct = obj as XmlSchemaComplexType;
        var seq = ct.ContentTypeParticle as XmlSchemaSequence;

        foreach (var item in seq.Items)
        {
            // item will be an XmlSchemaObject, so just call this same method
            //  with item as the parameter to parse it out
            var rng = GetAllMaxLength(item);

            // add the results to the dictionary
            foreach (var kvp in rng)
            {
                dict[kvp.Key] = kvp.Value;
            }
        }
    }
    else if (obj is XmlSchemaElement)
    {
        // if obj is an XmlSchemaElement, the you need to find the type
        //  based on the SchemaTypeName property.  This is why your 
        //  XmlSchemaSet needs to have class-level scope
        var ele = obj as XmlSchemaElement;
        var type = set.GlobalTypes[ele.SchemaTypeName];

        // once you have the type, call this method again and get the dictionary result
        var rng = GetAllMaxLength(type);

        // put the results in this dictionary.  The difference here is the dictionary
        //  key is put in the format you specified
        foreach (var kvp in rng)
        {
            dict[String.Format("{0}/{1}", ele.QualifiedName.Name, kvp.Key)] = kvp.Value;
        }
    }

    return dict;
}

private Int32 GetMaxLength(XmlSchemaSimpleType xsdSimpleType)
{
    // get the content of the simple type
    var restriction = xsdSimpleType.Content as XmlSchemaSimpleTypeRestriction;

    // if it is null, then there are no restrictions and return -1 as a marker value
    if (restriction == null) return -1;

    Int32 result = -1;

    // iterate the facets in the restrictions, look for a MaxLengthFacet and parse the value
    foreach (XmlSchemaObject facet in restriction.Facets)
    {
        if (facet is XmlSchemaMaxLengthFacet)
        {
            result = int.Parse(((XmlSchemaFacet)facet).Value);
            break;
        }
    }

    return result;
}

Then the usage is pretty simple, you just need to call the GetElementMaxLength(String) method and it will return a dictionary of the names in the format you provided with the value as the max length:

var results = GetElementMaxLength("Setup_Type");

foreach (var item in results)
{
    Console.WriteLine("{0} | {1}", item.Key, item.Value);                
}
like image 127
psubsee2003 Avatar answered Sep 28 '22 03:09

psubsee2003