Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Case insensitive XML parser in c#

Tags:

c#

xml

linq

xslt

xpath

Everything you do with XML is case sensitive, I know that.

However, right now I find myself in a situation, where the software I'm writing would yield much fewer errors if I somehow made xml name/attribute recognition case insensitive. Case insensitive XPath would be a god sent.

Is there an easy way/library to do that in c#?

like image 799
Arsen Zahray Avatar asked Feb 17 '12 20:02

Arsen Zahray


2 Answers

An XMl document can have two different elements named respectively: MyName and myName -- that are intended to be different. Converting/treating them as the same name is an error that can have gross consequences.

In case the above is not the case, then here is a more precise solution, using XSLT to process the document into one that only has lowercase element names and lowercase attribute names:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:variable name="vUpper" select=
 "'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/>

 <xsl:variable name="vLower" select=
 "'abcdefghijklmnopqrstuvwxyz'"/>

 <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
 </xsl:template>

 <xsl:template match="*[name()=local-name()]" priority="2">
  <xsl:element name="{translate(name(), $vUpper, $vLower)}"
   namespace="{namespace-uri()}">
       <xsl:apply-templates select="node()|@*"/>
  </xsl:element>
 </xsl:template>

 <xsl:template match="*" priority="1">
  <xsl:element name=
   "{substring-before(name(), ':')}:{translate(local-name(), $vUpper, $vLower)}"
   namespace="{namespace-uri()}">
       <xsl:apply-templates select="node()|@*"/>
  </xsl:element>
 </xsl:template>

 <xsl:template match="@*[name()=local-name()]" priority="2">
  <xsl:attribute name="{translate(name(), $vUpper, $vLower)}"
   namespace="{namespace-uri()}">
       <xsl:value-of select="."/>
  </xsl:attribute>
 </xsl:template>

 <xsl:template match="@*" priority="1">
  <xsl:attribute name=
   "{substring-before(name(), ':')}:{translate(local-name(), $vUpper, $vLower)}"
   namespace="{namespace-uri()}">
     <xsl:value-of select="."/>
  </xsl:attribute>
 </xsl:template>
</xsl:stylesheet>

when this transformation is applied on any XML document, for example this one:

<authors xmlns:user="myNamespace">
  <?ttt This is a PI ?>
  <Author xmlns:user2="myNamespace2">
    <Name idd="VH">Victor Hugo</Name>
    <user2:Name idd="VH">Victor Hugo</user2:Name>
    <Nationality xmlns:user3="myNamespace3">French</Nationality>
  </Author>
  <!-- This is a very long comment the purpose is
       to test the default stylesheet for long comments-->
  <Author Period="classical">
    <Name>Sophocles</Name>
    <Nationality>Greek</Nationality>
  </Author>
  <author>
    <Name>Leo Tolstoy</Name>
    <Nationality>Russian</Nationality>
  </author>
  <Author>
    <Name>Alexander Pushkin</Name>
    <Nationality>Russian</Nationality>
  </Author>
  <Author Period="classical">
    <Name>Plato</Name>
    <Nationality>Greek</Nationality>
  </Author>
</authors>

the wanted, correct result (element and attribute names converted to lowercase) is produced:

<authors><?ttt This is a PI ?>
   <author>
      <name idd="VH">Victor Hugo</name>
      <user2:name xmlns:user2="myNamespace2" idd="VH">Victor Hugo</user2:name>
      <nationality>French</nationality>
   </author><!-- This is a very long comment the purpose is
       to test the default stylesheet for long comments-->
   <author period="classical">
      <name>Sophocles</name>
      <nationality>Greek</nationality>
   </author>
   <author>
      <name>Leo Tolstoy</name>
      <nationality>Russian</nationality>
   </author>
   <author>
      <name>Alexander Pushkin</name>
      <nationality>Russian</nationality>
   </author>
   <author period="classical">
      <name>Plato</name>
      <nationality>Greek</nationality>
   </author>
</authors>

Once the document is converted to your desired form, then you can perform any desired processing on the converted document.

like image 60
Dimitre Novatchev Avatar answered Sep 20 '22 16:09

Dimitre Novatchev


You can create case-insensitive methods (extensions for usability), e.g.:

public static class XDocumentExtensions
{
    public static IEnumerable<XElement> ElementsCaseInsensitive(this XContainer source,  
        XName name)
    {
        return source.Elements()
            .Where(e => e.Name.Namespace == name.Namespace 
                && e.Name.LocalName.Equals(name.LocalName, StringComparison.OrdinalIgnoreCase));
    }
}
like image 21
Kirill Polishchuk Avatar answered Sep 18 '22 16:09

Kirill Polishchuk