I have a background working with relational databases but recently started to dabble in CouchDB and was surprised by how some non-relational operations, which would be simple in SQL, were not first-class functions in CouchDB.
I would appreciate you taking a moment to map each SQL statement below to its MapReduce equivalent.
SELECT COUNT(*) FROM products WHERE price < 20.00;
SELECT category, SUM(price) FROM products GROUP BY category;
UPDATE products SET price = 19.99 WHERE price = 20.00;
DELETE FROM products WHERE expires_at <= NOW();
When migrating from SQL to NoSQL, the primary key in the relational table becomes the partition key in the NoSQL table. If the RDBMS table must be joined to additional tables to retrieve the business object, those closely related tables should combine into a single NoSQL table.
Because SQL was originally developed for relational databases, it has to be modified for the Hadoop 1 model, which uses the Hadoop Distributed File System and Map-Reduce or the Hadoop 2 model, which can work without either HDFS or Map-Reduce.
MapReduce is the distributed data processing and querying engine to extract data from big datasets hosted on compute clusters in any typical Hadoop implementation. Structured Query Language (SQL) has been the de-facto standard for querying data out of RDBMS systems.
The SELECT
commands are pretty easy. Bulk writes are a bit more complicated. Generally, you'll use some view to retrieve the documents that need to be changed, then you'll use the _bulk_docs
API to send all the changes at once.
Also, consult the documentation regarding views for details for how to issue queries. This includes ordering, grouping, etc.
SELECT COUNT(*) FROM products WHERE price < 20.00;
function (doc) {
if (doc.price < 20) {
emit(doc.price);
}
}
_count
If you need this to work with an arbitrary amount, not just 20, then you'll need to emit the price in all cases, and use startkey
and endkey
to narrow down your resultset.
SELECT category, SUM(price) FROM products GROUP BY category;
function (doc) {
emit(doc.category, doc.price);
}
_sum
This map function essentially uses the category as the key, with the price as the value in your key/value pair. The reduce function will add up the prices for each different key.
UPDATE products SET price = 19.99 WHERE price = 20.00;
function (doc) {
if (doc.price == 20) {
emit(doc.price);
}
}
Once your application pulls down the contents of this view, you'll perform all the manipulations in your application code, then send back the results into the database via the _bulk_docs
API.
DELETE FROM products WHERE expires_at <= NOW();
function (doc) {
emit(doc.expires_at);
}
Depending on how your store your date-time values, you may need to adjust the map function as well as your query to the view. Using a timestamp (JS uses milliseconds instead of seconds) is probably the fastest way to accomplish this. Once you've set up your query, you'll add a new field to each of these documents. _deleted: true
. Once you send this list back into the database (again with _bulk_docs
) all the specified documents will be deleted.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With