Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to translate from SQL to NoSQL/MapReduce?

I have a background working with relational databases but recently started to dabble in CouchDB and was surprised by how some non-relational operations, which would be simple in SQL, were not first-class functions in CouchDB.

I would appreciate you taking a moment to map each SQL statement below to its MapReduce equivalent.

SELECT COUNT(*) FROM products WHERE price < 20.00;
SELECT category, SUM(price) FROM products GROUP BY category;
UPDATE products SET price = 19.99 WHERE price = 20.00;
DELETE FROM products WHERE expires_at <= NOW();
like image 900
sferik Avatar asked Jun 25 '11 18:06

sferik


People also ask

How do I convert SQL to NoSQL?

When migrating from SQL to NoSQL, the primary key in the relational table becomes the partition key in the NoSQL table. If the RDBMS table must be joined to additional tables to retrieve the business object, those closely related tables should combine into a single NoSQL table.

Does SQL support MapReduce?

Because SQL was originally developed for relational databases, it has to be modified for the Hadoop 1 model, which uses the Hadoop Distributed File System and Map-Reduce or the Hadoop 2 model, which can work without either HDFS or Map-Reduce.

What is SQL MapReduce?

MapReduce is the distributed data processing and querying engine to extract data from big datasets hosted on compute clusters in any typical Hadoop implementation. Structured Query Language (SQL) has been the de-facto standard for querying data out of RDBMS systems.


1 Answers

The SELECT commands are pretty easy. Bulk writes are a bit more complicated. Generally, you'll use some view to retrieve the documents that need to be changed, then you'll use the _bulk_docs API to send all the changes at once.

Also, consult the documentation regarding views for details for how to issue queries. This includes ordering, grouping, etc.


SELECT COUNT(*) FROM products WHERE price < 20.00;

Map

function (doc) {
  if (doc.price < 20) {
    emit(doc.price);
  }
}

Reduce

_count

If you need this to work with an arbitrary amount, not just 20, then you'll need to emit the price in all cases, and use startkey and endkey to narrow down your resultset.


SELECT category, SUM(price) FROM products GROUP BY category;

Map

function (doc) {
  emit(doc.category, doc.price);
}

Reduce

_sum

This map function essentially uses the category as the key, with the price as the value in your key/value pair. The reduce function will add up the prices for each different key.


UPDATE products SET price = 19.99 WHERE price = 20.00;

Map

function (doc) {
  if (doc.price == 20) {
    emit(doc.price);
  }
}

Once your application pulls down the contents of this view, you'll perform all the manipulations in your application code, then send back the results into the database via the _bulk_docs API.


DELETE FROM products WHERE expires_at <= NOW();

Map

function (doc) {
  emit(doc.expires_at);
}

Depending on how your store your date-time values, you may need to adjust the map function as well as your query to the view. Using a timestamp (JS uses milliseconds instead of seconds) is probably the fastest way to accomplish this. Once you've set up your query, you'll add a new field to each of these documents. _deleted: true. Once you send this list back into the database (again with _bulk_docs) all the specified documents will be deleted.

like image 57
Dominic Barnes Avatar answered Sep 18 '22 06:09

Dominic Barnes