We have a table with 17Mil rows containing product attributes, let's say they're:
brandID, sizeID, colorID, price, shapeID
And we need to query for aggregates by brand and size. Currently we query and filter this data by doing something like this:
select brandID, sizeID, count(*)
from table where colorID in (1,2,3) and price=10 and shapeID=17
--"additional complex where clause here"
group by brandID, sizeID
order by brandID, sizeID
And we report this data. The problem is, it takes 10 seconds or so to run this query (and this is a very simple example) in spite of the fact that the actual data returned will be just a few hundred rows.
I think we've reached our capacity for indexing this table so I don't think any amount of indexes will get us to near-instant results.
I know very little about OLAP or other analysis services, but what's out there for SQL Server that can pre-filter or pre-aggregate this table so that queries like the above (or similar returning equivalent data) can be performed? OR What's the best way to handle arbitrary where clauses on a very large table?
I think this is a perfect candidate for an olap cube. I have fact data with 100s of millions of rows. I was doing the kind of queries you described above and queries were coming back in minutes. I moved this into an OLAP cube and queries are now almost instantaneous. There is a bit of a learning curve for olap. I'd strongly suggest you find a tutorial on some simple cube building just to get your head around it. DBA colleagues had been telling me about cubes for years and I never quite got it. Now I don't know why I went so long without it.
In addition to OLAP, you may also want to research indexed views but if you are slicing the data in several ways, that may not be feasible.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With