Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

.NET ETL Process

First some background; we are developing a datawarehouse and doing some research on what tools to use for our ETL process. The team is very developer centric, everyone is knowledgeable with C#. So far I have looked at RhinoETL, Pentaho (Kettle), Astrix Centerprise. SSIS is out for a number of reasons which are outside the scope of this question.

At this time, I am leaning towards something more developer oriented like RhinoETL because it seems like the path of least resistance for a group of devs. Do the other more visual designer oriented products bring anything to the table that RhinoETL doesn't? Are there any specific things I should be paying attention to when evaluating these ETL tools? Are there any other tools that we should also investigate?

like image 640
Matt Avatar asked Oct 03 '11 23:10

Matt


2 Answers

Recently my coworker and I did some simple performance testing between RhinoETL and SSIS. It seem that for simple data flows SSIS always outperformed RhinoETL (moves 2,000,000 records about 30% faster). If you are using source control (in our case TFS), you can not easily view differences between versions of dtsx files (SSIS files), where developing with RhinoETL allows you to utilize TFS features.

Another advantage RhinoETL has is seen if you develop a User Interface on top of your data warehouse. You can share code between these two programs.

Although several of the members of our SSIS team come from .Net backgrounds, our management decided to continue developing with SSIS (although they upgraded to SSIS 2008 --another topic altogether) because they felt it was easier to have a developer learn SSIS than .Net.

like image 91
David Benham Avatar answered Sep 19 '22 15:09

David Benham


I know this is a late answer, but as I needed a proper Elt with all SSIS features but in a 100% .net environment, I came up developing my own.

  • Github repo: https://github.com/paillave/Etl.Net
  • Begining of documentation: https://paillave.github.io/Etl.Net

For sure, performances are not as good as SSIS. I believe that if you want massive performances for huge volumes to integrate and transform, you should still use SSIS.

The main thing that I really needed that no other kinda-etl tool like RhinoEtl provides, is a proper tracing system that permits to have traces of any single details that is easily manipulate to record if necessary. I made lot of out of the box adapters for file system, ftp, sftp, xml, csv, entityframework core and bulk load. I even came up with a visual tool to view the structure of the transformation process.

It took me 10 months so far, and I open sourced it. It still lacks a lot of documentation (huge work to achieve). I must complete it with a much bigger set of unit tests (also huge work to achieve) for me to decently release it in beta version. Even if I still left it in alpha version, it is the foundation of all ETL processes of my company, and it works like hell!

like image 40
Stephane Avatar answered Sep 20 '22 15:09

Stephane