Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are there any data warehouse frameworks?

I've got a lot of mysql data that I need to generate reports from. It's mostly historic data so it won't be changing much, but it weighs in at 20-30 gigabytes easily and is expected to grow. I currently have a collection of php scripts that will do some complex queries and output csv and excel files. I also use phpMyAdmin with bookmarked queries. I manually edit them to change the parameters. The amount of data is growing and the number of people who need access to it is also growing, so I'm making the time to improve this situation.

I started reading about data warehousing the other day and it seems that this an area that relates to what I need to do. I've read some good articles and am even waiting on a book. I think I'm getting a handle on what these sorts of systems do and what's possible.

Creating a reporting system for my data has always been on a todo list, but until recently I figured it would be a highly niche programing venture. Since I now know data warehousing is a common thing, I figure there must be some sort of reporting/warehousing frames available to ease in the development. I'd gladly skip writing interfaces and scripts to schedule and email reports and the like and stick to writing queries and setting up relations.

I've mostly been a lamp guy, but I'm not above switching languages or platforms. I just need a more robust solution as my one off scripts don't scale well.

So where's a good place to get started?

like image 248
reconbot Avatar asked Oct 01 '08 17:10

reconbot


People also ask

What is the framework of data warehouse?

A typical data warehouse has four main components: a central database, ETL (extract, transform, load) tools, metadata, and access tools. All of these components are engineered for speed so that you can get results quickly and analyze data on the fly.

What are the three types of data warehousing?

The three main types of data warehouses are enterprise data warehouse (EDW), operational data store (ODS), and data mart.

What are ETL frameworks?

An ETL Framework (short for Extract, Transform, Load) is one of the most critical first steps needed for creating a successful data warehouse. Because the truth is, it's not as simple as retrieving data from multiple sources, dumping it en masse, and calling it a day.


2 Answers

I'll discuss a few points on the {budget, business utility function, time frame} spectrum out there. For convenience, let's follow the architecture conceptualization you linked to at

    WikipediaDataWarehouseArticle

  • Operational database layer
    The source data for the data warehouse - Normalized for In One Place Only data maintenance

  • Data access layer
    The transformation of your source data into your informational access layer.
    ETL tools to extract, transform, load data into the warehouse fall into this layer.

  • Informational access layer
      • Report-facilitating Data Structure
          Data is not maintained here. It is merely a reflection of your source data
          Hence, denormalized structures (containing duplicate, but systematically derived data)
          are usually most effective here
      • Reporting tools
          How do you actually allow your users access to the data
          • pre-canned reports (simple)
          • more dynamic slice-and-dice access methods

        The data accessed for reporting and analyzing and the tools for reporting and analyzing data
        fall into this layer. And the Inmon-Kimball differences about design methodology,
        discussed later in the Wikipedia article, have to do with this layer.

  • Metadata layer (facilitates automation, organization, etc)

Roll your own (low-end)
For very little out-of-pocket cost, just recognizing the need for the denormalized structures can buy those that are not using it some efficiencies

Get in the ballgame (some outlays required)
You don't need to use all the functionality of a platform right off the bat.
IMO, however, you want to be on a platform that you know will grow, and in the highly competitive and consolidating BI environment, that seems to be one of the four enterprise mega-vendors (my opinion)

  • Microsoft (the platform of our 110 employee firm)
  • SAP
  • Oracle
  • IBM

    BiMarketStateArticle

My firm is at this stage, using some of the ETL capability offered by SQL Server Integration Services (SSIS) and some alternate usage of the open source, but in practice license requiring Talend product in the "Data Access Layer", a denormalized reporting structure (implemented completely in the basic SQL Server database), and SQL Server Reporting Services (SSRS) to largely automate (based on your skill) the production of pre-specified reports. Note that an SSRS "report" is merely a (scalable) XML configuration/specification that gets rendered at runtime via the SSRS engine. Choices such as export to an excel file are simple options.

Serious Commitment (some significant human commitment required)
Notice above that we have yet to utilize the data mining/dynamic slicing/dicing capabilities of SQL Server Analysis Services. We are working toward that, but now focused on improving the quality of our data cleansing in the "Data Access Layer".

I hope this helps you to get a sense of where to start looking.

like image 78
6eorge Jetson Avatar answered Oct 14 '22 12:10

6eorge Jetson


Pentaho has put together a pretty comprehensive suite of products. The products are "free", but be prepared for the usual heavy sell once you fork over your identifying information.

I haven't had a chance to really stretch them as we're a Microsoft shop from one sad end to the other.

like image 22
Nick Ryberg Avatar answered Oct 14 '22 11:10

Nick Ryberg