What's a good way to structure a 100M record table for fast ad-hoc queries?

Tags:

The scenario is quite simple, there are about 100M records in a table with 10 columns (kind of analytics data), and I need to be able to perform queries on any combination of those 10 columns. For example something like this:

how many records with a = 3 && b > 100 are there in past 3 months?

Basically all of the queries are going to be a kind of how many records with attributes X are there in time interval Y, where X can be any combination of those 10 columns.

The data will keep coming in, it is not just a pre-given set of 100M records, but it is growing over time.

Since the column selection can be completely random, creating indexes for popular combinations is most likely not possible.

The question has two parts:

How should I structure this in a SQL database to make the queries as fast as possible, and what are some general steps I can take to improve performance?
Is there any kind of NoSQL database that is optimized for this kind of search? I can think of only ElasticSearch, but I'm not it would perform very well on this large data set.

792

asked Apr 27 '12 07:04

Jakub Arnold

1 Answers

Without indexes your options for tuning an RDBMS to support this kind of processing are severely limited. Basically you need massive parallelism and super-fast kit. But clearly you're not storing realtional data so an RDBMS is the wrong fit.

Pursuing the parallel route, the industry standard is Hadoop. You can still use SQL style queries through Hive.

Another noSQL option would be to consider a columnar database. These are an alternative way of organising data for analytics without using cubes. They are good at loading data fast. Vectorwise is the latest player in the arena. I haven't used it personally, but somebody at last night's LondonData meetup was raving to me about it. Check it out.

Of course, moving away from SQL databases - in whatever direction you go - will incur a steep learning curve.

107

answered Oct 12 '22 22:10

APC

Related questions
                            
                                Linear regression confidence intervals in SQL
                            
                                Generate breadcrumbs of categories stored in MySQL
                            
                                Optimizing Bing Maps Geocode and RouteMapping requests
                            
                                index help for a MySQL query using greater-than operator and ORDER BY
                            
                                Changing DBML, how to change SQL database?
                            
                                I am looking for a radio advertising scheduling algorithm / example / experience
                            
                                DELETE statement conflicted with REFERENCE constraint
                            
                                Is it possible to implement Test Driven Development in SQL?
                            
                                Intellisense accept shortcut
                            
                                Given desired results and database information, programically build an SQL query that gives those results
                            
                                Replacement for deprecated SQL Server User Defined Type with a bound Rule and Default
                            
                                How can you get a histogram of counts from a join table without using a subquery?
                            
                                Two queries are fast separately, slow when joined as subqueries
                            
                                Do large systems use foreign keys in their databases? [closed]
                            
                                Calculating difference from previous record
                            
                                How to add test sql connection button in Wix
                            
                                .NET dll for Natural language to SQL/SPARQL
                            
                                SQL: Feeding SELECT output to LIKE

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What's a good way to structure a 100M record table for fast ad-hoc queries?

Tags:

sql

database

search

database-design

nosql

Jakub Arnold

People also ask

1 Answers

APC

Recent Activity

Donate For Us