Data Warehousing arbitrary fields

Tags:

In our application, we support user-written plugins.

Those plugins generate data of various types (int, float, str, or datetime), and those data are labeled with bunches of meta-data (user, current directory, etc.) as well as three free-text fields (MetricName, Var1, Var2) .

Now we have several years of this data, and I'm trying to design a schema which allows very fast access to those metrics in an analytical fashion (charts and stuff). This is easy as long as there are only a few metrics we're interested in, but we have a large number of different metrics at different granularities, and we'd like to store user-added data to allow for later analysis (possibly after a schema change).

Example data: (please keep in mind this is very simplified)

=========================================================================================================
| BaseDir         | User    | TrialNo | Project | ... | MetricValue | MetricName | Var1 | Var2      |
=========================================================================================================
| /path/to/me     | me      | 0       | domino  | ... | 20          | Errors     | core  | dumb      |
| /path/to/me     | me      | 0       | domino  | ... | 98.6        | Tempuratur | body  |           |
| /some/other/pwd | oneguy  | 223     | farq    | ... | 443         | ManMonths  | waste | Mythical  |
| /some/other/pwd | oneguy  | 224     | farq    | ... | 0           | Albedo     | nose  | PolarBear |
| /path/to/me     | me      | 0       | domino  | ... | 70.2        | Tempuratur | room  |           |
| /path/to/me2    | me      | 2       | domino  | ... | 2020        | Errors     | misc  | filtered  |

Anyone can add a parser plugin to start measuring a AirSpeed metric, and we'd like our analisys tools to "just work" on that new metric.

Update:

Considering that many of the MetricName's are well-known beforehand, I can satisfy my requirements if I can enable analysis on those metrics, and simply store the other user-added metrics. We can accept the fact that new metrics won't be available for heavy-duty analysis without an edit to the schema.

What do you guys think of this solution?

I've divided our metrics into three fact tables, one for facts that don't need a MetricTopic, one for ones that do, and one for all the other metrics, including unexpected ones.

Metrics Schema #3

For the bounty:

I'll accept any critique which shows how to make this system more functional, or brings it into closer alignment with industry best-practices. References to literature gives added weight.

802

asked Sep 15 '10 23:09

bukzor

1 Answers

If I understand correctly, you are looking for a schema to support on-fly creation of measures in a DW. In a classical data warehouse each measure is a column, so in a Kimball star you would need to add a column for each new measure -- change the schema.

What you have is an EAV model, and analytics on EAV is not easy and not fast -- take a look at this discussion.

I would suggest you look at tools like splunk, which is suited for theis type of problems.

172

answered Sep 21 '22 23:09

Damir Sudarevic

Related questions
                            
                                spring data jpa native query with join
                            
                                Using SQLAlchemy ORM for a non-primary key, unique, auto-incrementing id
                            
                                How to properly close datasource connection?
                            
                                Schema Privileges does not display in MySQL Workbench
                            
                                Why can I save without @Transactional? [duplicate]
                            
                                Eloquent taking too long with remote MySQL DB
                            
                                Calculate average for each month for a given date range
                            
                                Getting Values From Json Data Inside Array in Mysql
                            
                                php7.4 mysqli times out with "gone away"
                            
                                How to use connection pool with java,MySQL and Tomcat 6
                            
                                Cache data in PHP SESSION, or query from db each time?
                            
                                Auto_increment values in InnoDB?
                            
                                Drop screwed up table in Mysql db
                            
                                Import large file on MySQL DB
                            
                                How do I Combine these SQL SELECT queries into one SELECT statement
                            
                                How to call MySQL stored procedure from Rails?
                            
                                "Find nearest location" by Zip/Postal Code?
                            
                                MySQL 5.1 / phpMyAdmin - logging CREATE/ALTER statements
                            
                                MySQL: order by and limit gives wrong result
                            
                                How do I select a fixed number of rows for each group?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Data Warehousing arbitrary fields

Tags:

database

mysql

database-design

blob

data-warehouse

bukzor

People also ask

1 Answers

Damir Sudarevic

Recent Activity

Donate For Us