We have a 300 Gb+ data array we'd like to query as fast as possible. Traditional SQL databases (specifically, SQL Server) cannot handle this volume as effectively as we need (like, perform a select
with 10-20 conditions in where
clause in less than 10 sec), so I'm investigating other solutions for this problem.
I've been reading about NoSQL and this whole thing looks promising, but I'd prefer to hear from those who have used it in real life.
What can you suggest here?
EDIT to clarify what we're after.
We're a company developing an app whereby users can search for tours and perform bookings of said tours, paying for them with their plastic cards. This whole thing can surely be Russia-specific, so bear with me.
When a user logs on to the site, she is presented with a form similar to this:
alt text http://queenbee.alponline.ru/searchform.png
Here, user selects where she leaves from and where she goes to, dates, duration and all that.
After hitting "Search" a request goes to our DB server, which, with cannot handle such load: queries include various kinds of parameters. Sharding doesn't work well either.
So what I'm after is a some kind of a pseudo-database, which can do lightning fast queries.
They report that Couchbase and MongoDB are the fastest two overall for read, write, and delete operations.
While more recent benchmark tests show that other RDBMSs like PostgreSQL can match or at least come close to MySQL in terms of speed, MySQL still holds a reputation as an exceedingly fast database solution.
1:- Check Indexes. 2:- There should be indexes on all fields used in the WHERE and JOIN portions of the SQL statement 3:- Limit Size of Your Working Data Set. 4:- Only Select Fields You select as Need. 5:- Remove Unnecessary Table and index 6:- Remove OUTER JOINS.
If you want to do ad-hoc queries for reporting or analysis you're probably better off using something that will play nicely with off-the-shelf reporting tools. Otherwise you are likely to find yourself getting dragged off all the time to write little report programs to query the data. This is a strike against NoSQL type databases, but it may or may not be an issue depending on your circumstances.
300GB should not be beyond the capabilities of modern RDBMS platforms, even MS SQL Server. Some other options for large database queries of this type are:
See if you can use a SSAS cube and aggregations to mitigate your query performance issues. Usage-based optimiisation might get you adequate performance without having to get another database system. SSAS can also be used in shared-nothing configurations, allowing you to stripe your queries across a cluster of relatively cheap servers with direct-attach disks. Look at ProClarity for a front-end if you do go this way.
Sybase IQ is a RDBMS platform that uses an underlying data structure optimised for reporting queries. It has the advantage that it plays nicely with a reasonable variety of conventional reporting tools. Several other systems of this type exist, such as Red Brick, Teradata or Greenplum (which uses a modified version of PostgreSQL). The principal strike against these systems is that they are not exactly mass market items and can be quite expensive.
Microsoft has a shared-nothing version of SQL Server in the pipeline, which you might be able to use. However they've tied it to third party hardware manufacturers so you can only get it with dedicated (and therefore expensive) hardware.
Look for opportunities to build data marts with aggregated data to reduce the volumes for some of the queries.
Look at tuning your hardware. Direct attach SAS arrays and RAID controllers can put through streaming I/O of the sort used in table scans pretty quickly. If you partition your tables over a large number of mirrored pairs you can get very fast streaming performance - easily capable of saturating the SAS channels.
Practically, you're looking at getting 10-20GB/sec from your I/O subsystem if you want the performance targets you describe, and it is certianly possible to do this without resorting to really exotic hardware.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With