I need to load data for my Rails application from multiple providers (REST/SOAP based XML feeds) into the database on a recurring basis. I have written a set of Rake tasks which are kicked off by whenever-generated cron jobs. Each task hits the partner feed endpoint, parses the feed and loads it into the database.
Instead of writing Rake tasks, should I use an ETL framework like ActiveWarehouse (http://activewarehouse.rubyforge.org/etl/) instead? Any suggestions on the best way to do this in Rails?
If you are just loading data into a set of tables, and the use case is simple such as just adding new records or updating basic ones, and your load is meeting your requirements, I would stick with that. You could certainly use ActiveWarehouse as well, but it sounds like overkill. If, however, you need to support changing dimensions (ie. preserve history of data changes over time), or other 'data warehouse' features, then something like ActiveWarehouse starts to have more value.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With