Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a data storage pattern similar to mipmaps in graphics?

We've got a bunch of data the users may want to view windows of and do so quickly. They may want to look at a window of the data that is a day, a week, a month, or an arbitrary beginning and ending data. Sorting and summing up all of this stuff in real time is proving to be painful for us so I got the idea of doing something similar to Mipmaps in 3D rendering. You end up storing the same data pre-calculated at a variety of different scales and then interpolate the results using the varying scales. So I would already know what the numbers were for a year, a given month, a given week, and a given day for a store and if they ask for a particular range I use the various scales to quickly add up something that gives the right results but I don't have to necessarily reprocess the full data set, I just retrieve four or five records and add or subtract them.

Is this a real pattern? Does it make any sense and there are places I can read about how to do it best or are there much better ways of dealing with large chunks of data like this where it needs to be viewed in varying slices?

It seems like this should be a well known and solved problem. For example, lots of people have stock portfolios and they need to do this kind of thing every day. Our data isn't stock prices, but the idea is the same.

like image 307
John Munsch Avatar asked Oct 14 '22 12:10

John Munsch


1 Answers

OK, I searched and searched and searched some more. Andy Dent's links made me start describing the data as "time-series" and that helped some. Then I ran across OLAP and realized that what I'm doing is reinventing that. I knew this had to be a well-known, thoroughly dealt-with problem and I was right. OLAP is it.

You build a bunch of aggregate tables which aggregate the data along particular dimensions (time in this case) and you can even get tools like Mondrian which will take queries written in another query language (i.e. not SQL) and a set of fact tables plus aggregates and it will decide how best to perform the query against those tables.

like image 162
John Munsch Avatar answered Oct 18 '22 04:10

John Munsch