Which DB to choose for finding best matching records?

Tags:

I'm storing an object in a database described by a lot of integer attributes. The real object is a little bit more complex, but for now let's assume that I'm storing cars in my database. Each car has a lot of integer attributes to describe the car (ie. maximum speed, wheelbase, maximum power etc.) and these are searchable by the user. The user defines a preferred range for each of the objects and since there are a lot of attributes there most likely won't be any car matching all the attribute ranges. Therefore the query has to return a number of cars sorted by the best match.

At the moment I implemented this in MySQL using the following query:

SELECT *, SQRT( POW((a < min_a)*(min_a - a) + (a > max_a)*(a - max_a), 2) +
                POW((b < min_b)*(min_b - b) + (b > max_b)*(b - max_b), 2) +
                ... ) AS match
WHERE a < (min_a - max_allowable_deviation) AND a > (max_a + max_allowable_deviation) AND ...
ORDER BY match ASC

where a and b are attributes of the object and min_a, max_a, min_b and max_b are user defined values. Basically the match is the square root of the sum of the squared differences between the desired range and the real value of the attribute. A value of 0 meaning a perfect match.

The table contains a couple of million records and the WHERE clausule is only introduced to limit the number of records the calculation is performed on. An index is placed on all of the queryable records and the query takes like 500ms. I'd like to improve this number and I'm looking into ways to improve this query.

Furthermore I am wondering whether there would be a different database better suited to perform this job. Moreover I'd very much like to change to a NoSQL database, because of its more flexible data scheme options. I've been looking into MongoDB, but couldn't find a way to solve this problem efficiently (fast).

Is there any database better suited for this job than MySQL?

877

asked Jul 24 '11 00:07

Ewout Kleinsmann

1 Answers

Take a look at R-trees. (The pages on specific variants go in to a lot more detail and present pseudo code). These data structures allow you to query by a bounding rectangle, which is what your problem of searching by ranges on each attribute is.

Consider your cars as points in n-dimensional space, where n is the number of attributes that describe your car. Then given a n ranges, each describing an attribute, the problem is the find all the points contained in that n-dimensional hyperrectangle. R-trees support this query efficiently. MySQL implements R-trees for their spatial data types, but MySQL only supports two-dimensional space, which is insufficient for you. I'm not aware of any common databases that support n-dimensional R-trees off the shelf, but you can take some database with good support for user-defined tree data structures and implement R-trees yourself on top of that. For example, you can define a structure for an R-tree node in MongoDB, with child pointers. You will then implement the R-tree algorithms in your own code while letting MongoDB take care of storing the data.

Also, there's this C++ header file implementing of an R-tree, but currently it's only an in-memory structure. Though if your data set is only a few million rows, it seems feasible to just load this memory structure upon startup and update it whenever new cars are added (which I assume is infrequent).

168

answered Nov 15 '22 03:11

Kui Tang

Related questions
                            
                                How to return multi-count of specific values
                            
                                How do I build a UNION query with ORDER BY and GROUP BY in Kohana's query builder?
                            
                                How to "close" database when the query happens?
                            
                                Using CASE, WHEN, THEN, END in a select query with MySQL
                            
                                PHP and MySql with date problem...?
                            
                                SQL query return duplicate with most recent date and more
                            
                                MySQL connection using ODBC (5.1) with SSL
                            
                                SQL: How to return an non-existing row?
                            
                                Error in encoding mysql -> How can I reconvert it to something else?
                            
                                How to get the results from nodejs using mysql package?
                            
                                Check cookie and redirect with Apache
                            
                                how to return row count of mysql sub query
                            
                                How do I rank array values with duplicate values and skipping some positions if there is a tie?
                            
                                mySQL AND memcached for PHP sessions?
                            
                                Best way to update MySQL table with random values from another table
                            
                                Is there an AJAX Library for Codeigniter
                            
                                Can't create databases with a new user
                            
                                MYSQL Database offline use
                            
                                drupal views module mysql error
                            
                                Amazon RDS MYSQL - unresponsive every 15 min's while GB's of space get used and then reclaimed

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Which DB to choose for finding best matching records?

Tags:

sql

mysql

Ewout Kleinsmann

People also ask

1 Answers

Kui Tang

Recent Activity

Donate For Us