Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best NoSql for querying date ranges?

Given a store which is a collection of JSON documents in the (approximate) form of:

{
PeriodStart: 18/04/2011 17:10:49 
PeriodEnd: 18/04/2011 17:15:54
Count: 12902
Max: 23041 Min: 0
Mean: 102.86 StdDev: 560.97
},
{
PeriodStart: 18/04/2011 17:15:49 
PeriodEnd: 18/04/2011 17:20:54
Count: 10000
Max: 23041 Min: 0
Mean: 102.86 StdDev: 560.97
}... etc

If I want to query the collection for given date range (say all documents from last 24 hours), which would give me the easiest querying operations to do this?

To further elaborate on requirements:

  • Its for an application monitoring service, so strict CAP/ACID isn't necessarily required
  • Performance isn't a primary consideration either. Read/writes would be at most 10s per second which could be handled by an RDBMS anyway
  • Ability to handle changing document schema's would be desirable
  • Ease of querying ability of lists/sets is important (ad-hoc queries an advantage)
like image 957
mwjackson Avatar asked Apr 18 '11 16:04

mwjackson


1 Answers

I may not have your query requirements down exactly, as you didn't specify. However, if you need to find any documents that start or end in a particular range, then you can apply most of what is written below. If that isn't quite what you're after, I can be more helpful with a bit more direction. :)

If you use CouchDB, you can create your indexes by splitting up the parts of your date into an array. ([year, month, day, hour, minute, second, ...])

Your map function would probably look similar to:

function (doc) {
    var date = new Date(doc.PeriodStart);
    emit([ date.getFullYear(), date.getMonth(), date.getDate(), date.getHours(), date.getMinutes() ] , null]);
}

To perform any sort of range query, you'd need to convert your start and end times into this same array structure. From there, your view query would have params called startkey and endkey. They would would receive the array parameters for start and end respectively.

So, to find the documents that started in the past 24 hours, you would send a querystring like this in addition to the full URI for the view itself:

// start: Apr 17, 2011 12:30pm ("24 hours ago")
// end:   Apr 18, 2011 12:30pm ("today")
startkey=[2011,04,17,12,30]&endkey=[2011,04,18,12,30]

Or if you want everything from this current year:

startkey=[2011]&endkey=[2011,{}]

Note the {}. When used as an endkey: [2011,{}] is identical to [2012] when the view is collated. (either format will work)

The extra components of the array will simply be ignored, but the further specificity you add to your arrays, the more specific your range can be. Adding reduce functions can be really powerful here, if you add in the group_level parameter, but that's beyond the scope of your question.

like image 137
Dominic Barnes Avatar answered Oct 08 '22 18:10

Dominic Barnes