Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the best approach to generalize and aggregate XML dumps in C#?

Here is the business part of the issue:

  • Several different companies send a XML dump of the information to be processed.
  • The information sent by the companies are similar ... not exactly same.
  • Several more companies would be soon enlisted and would start sending information

Now, the technical part of the problem is I want to write a generic solution in C# to accommodate this information for processing. I would be transforming the XML in my C# class(es) to fit in to my database model.

Is there any pattern or solution for this issue to be handled generically without needing to change my solution in case of addition of many companies later?

What would be the best approach to write my parser/transformer?

like image 536
GilliVilla Avatar asked Dec 17 '10 02:12

GilliVilla


4 Answers

This is how I have done something similar in the past.

As long as each company has its own fixed format which they use for their XML dump,

  1. Have an specific XSLT for each company.
  2. Have a way of indicating which dump is sourced from where (maybe different DUMP folders for each company )
  3. In your program, based on 2, select 1 and apply it to the DUMP
  4. All the XSLT's will transform the XML to your one standard database schema
  5. Save this to your DB

Each new company addition is at the most a new XSLT In cases where the schema is very similar, the XSLT's can be just re-used and then specific changes made to them.

Drawback to this approach: Debugging XSLT's can be a bit more painful if you do not have the right tools. However a LOT of XML Editors (eg XML Spy etc) have excellent XSLT debugging capabilities.

like image 51
Jagmag Avatar answered Nov 10 '22 04:11

Jagmag


Sounds to me like you are just asking for a design pattern (or set of patterns) that you could use to do this in a generic, future-proof manner, right?

Ideally some of the attributes that you probably want

  • Each "transformer" is decoupled from one another.
  • You can easily add new "transformers" without having to rewrite your main "driver" routine.
  • You don't need to recompile / redeploy your entire solution every time you modify a transformer, or at least add a new one.

Each "transformer" should ideally implement a common interface that your driver routine knows about - call it IXmlTransformer. The responsibility of this interface is to take in an XML file and to return whatever object model / dataset that you use to save to the database. Each of your transformers would implement this interface. For common logic that is shared by all transformers you could either create a based class that all inherit from, or (my preferred choice) have a set of helper methods which you can call from any of them.

I would start by using a Factory to create each "transformer" from your main driver routine. The factory could use reflection to interrogate all assemblies it can see that, or something like MEF which could do a lot of the work for you. Your driver logic should use the factory to create all the transformers and store them.

Then you need some logic and mechanism to "lookup" each XML file received to a given Transformer - perhaps each XML file has a header that you could use to identify or something similar. Again, you want to keep these decoupled from your main logic so that you can easily add new transformers without modification of the driver routine. You could e.g. supply the XML file to each transformer and ask it "can you transform this file", and it is up to each transformer to "take responsibility" for a given file.

Every time your driver routine gets a new XML file, it looks up the appropriate transformer, and runs it through; the result gets sent to the DB processing area. If no transformer can be found, you dump the file in a directory for interrogation later.

I would recommend reading a book like Agile Principles, Patterns and Practices by Robert Martin (http://www.amazon.co.uk/Agile-Principles-Patterns-Practices-C/dp/0131857258), which gives good examples of appropriate design patterns for situations like yours e.g. Factory and DIP etc.

Hope that helps!

like image 25
Isaac Abraham Avatar answered Nov 10 '22 03:11

Isaac Abraham


Solution proposed by InSane is likley the most straigh forward and definitely XML friendly approach.

If you looking for writing your own code to do conversion of different data formats than implementing multiple reader entities that would read data from each distinct format and transform to unified format, than your main code would work with this entities in unified way, i.e. by saving to the database.

Search for ETL - (Extract-Trandform-Load) to get more information - What model/pattern should I use for handling multiple data sources? , http://en.wikipedia.org/wiki/Extract,_transform,_load

like image 1
Alexei Levenkov Avatar answered Nov 10 '22 03:11

Alexei Levenkov


Using XSLT as proposed in the currently most upvoted answer, is just moving the problem, from c# to xslt.

You are still changing the pieces that process the xml, and you are still exposed to how good/poor is the code structured / whether it is in c# or rules in the xslt.

Regardless if you keep it in c# or go xslt for those bits, the key is to separate the transformation of the xml you receive from the various companies into a unique format, whether that's an intermediate xml or a set of classes where you load the data you are processing.

Whatever you do avoid getting clever and trying to define your own generic transformation layer, if that's what you want Do use XSLT since that's what's for. If you go with c#, keep it simple with a transformation class for each company that implements the simplest interface.

On the c# way, keep any reuse you may have between the transformations to composition, don't even think of inheritance to do so ... this is one of the areas where it gets very ugly quickly if you go that way.

like image 1
eglasius Avatar answered Nov 10 '22 04:11

eglasius