Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which, if any, of the NoSQL databases can provide stream of *changes* to a query result set?

Which, if any, of the NoSQL databases can provide stream of changes to a query result set?

Could anyone point me at some examples?

Firstly, I believe that none of the SQL databases provide this functionality - am I correct?

I need to be able to specify arbitrary, simple queries, whose equivalent in SQL might be written:

SELECT * FROM accounts WHERE balance < 0 and balance > -1000;

I want an an initial result set:

id: 100, name: Fred, balance: -10
id: 103, name: Mary, balance: -200

but then I want a stream of changes to follow, forever, until I stop them:

meta: remove, id: 100
meta: add,    id: 104, name: Alice, balance: -300
meta: remove, id: 103
meta: modify, id: 104, name: Alice, balance: -400
meta: modify, id: 104, name: Alison, balance: -400
meta: add,    id: 101, name: Clive, balance: -200
meta: modify, id: 104, name: Alison, balance: -100
...

Note: I'm not talking about streaming large result sets. I'm looking for a soft-realtime stream of changes.

Also, it needs to scale out, if possible.

Thanks,

Chris.

like image 536
fadedbee Avatar asked Mar 10 '11 08:03

fadedbee


People also ask

Which database is a NoSQL database type that can quickly?

Schemaless Tables NoSQL Databases are schema-less and can store heterogeneous data from the same domain easily. Users can quickly load complex schemas and heterogeneous data in the same NoSQL documents or tables.

Which query language is used for NoSQL database?

UnQL: A Standardized Query Language for NoSQL Databases.

What kinds of database technology can be present in a NoSQL database?

Overview. NoSQL databases store data in documents rather than relational tables. Accordingly, we classify them as "not only SQL" and subdivide them by a variety of flexible data models. Types of NoSQL databases include pure document databases, key-value stores, wide-column databases, and graph databases.


6 Answers

CouchDB has a changes feed. Basically it's a block chain, or a history of every change in the database since inception. You can get the feed via JSON, JSONP, long polling or as a continuous stream and write applications that respond to changes in the database.

Here's the changes feed from my blog

To learn more check out this section of the CouchDB guide

like image 69
Max Ogden Avatar answered Nov 03 '22 20:11

Max Ogden


Although an answer has been accepted, there is another answer that gets to the heart of the assumptions underneath your question.

What is the business concern that you have related to getting a list of changes to the data? What if, instead of merely getting the list of changes to the data, you received a set of events that told you why and how the data changed.

This concept is one of the fundamental reasons behind "CQRS" as an architecture. Basically you store all events that caused a change to your data, e.g. FundsDeposited, FundsWithdrawn, etc. and you gain the ability to "replay" those events and discover not just how your data changed over time, but why.

Once you go down that road, you gain the ability to store events as a stream and you are no longer limited to a small handful of storage engines. Instead you could literally use any storage engine and it would get the job done.

like image 22
Jonathan Oliver Avatar answered Nov 03 '22 21:11

Jonathan Oliver


Not sure if this is exactly the kind of thing you are looking for, but thought it possibly relevant enough to warrant a mention!

If you use replication in MongoDB, all write operations are stored in an oplog (operation log). So every insert/update/delete is recorded in there so that they can be replayed on the secondary nodes. It's a capped collection so cycles round and overwrites itself (you can set it's size). But in theory, this oplog could be used as a way to retrieve a stream of changes - I haven't tried it myself, but possibly you could poll that oplog.

like image 20
AdaTheDev Avatar answered Nov 03 '22 21:11

AdaTheDev


Only a brainstorming answer:

Let's take for example a MongoDB AND do not want to access the changes feed like described above. Yes, it sounds crappy compared to the other answers, but was my first idea before these answers popped up while writing ...

Current features -related to this question- are Capped Collections (http://www.mongodb.org/display/DOCS/Capped+Collections) and maybe Server-side Code Execution (http://www.mongodb.org/display/DOCS/Server-side+Code+Execution).

With capped collections it would be easier to write a lot of data but read less (like log files) - this collection type is made for such cases. The server-side scripts can be used for outsourcing a lot of processing (less app code), but you can leave away this point if you want to completely integrate the logic in your app.

Don't know if there NoSQL DBs with "hooks". I know that's possible in postgres (SQL).

Currently the streaming logic has to be implemented in the app code AFAIK.

In CouchDB it could be possible with "Views" which are not implemented in MongoDB (if this isn't correct, please give me a link, this is a interesting topic, too!).

Don't know if this is helpful. It's my first try of an answer here on SO.

like image 45
asaaki Avatar answered Nov 03 '22 21:11

asaaki


this type of thing should be done in the app, not the database.

Meaning, every time you make a change, it should be recorded as a new record. Not a modification to the record. There's a whole lot more intelligence you can add to your app if you do it this way

like image 31
BenG Avatar answered Nov 03 '22 21:11

BenG


As of v.3.6, MongoDB uses Change Streams to allow applications to subscribe to a realtime list of changes:

Change streams allow applications to access real-time data changes without the complexity and risk of tailing the oplog. Applications can use change streams to subscribe to all data changes on a collection and immediately react to them.

Change streams can benefit architectures with reliant business systems, informing downstream systems once data changes are durable. For example, change streams can save time for developers when implementing Extract, Transform, and Load (ETL) services, cross-platform synchronization, collaboration functionality, and notification services.

By default, a stream returns changes to all documents in a collection, but you can add an agregation pipeline to filter to only the documents which match your query result set.

like image 44
Vince Bowdren Avatar answered Nov 03 '22 21:11

Vince Bowdren