Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Proper Way to Retrieve More than 128 Documents with RavenDB

Tags:

ravendb

I know variants of this question have been asked before (even by me), but I still don't understand a thing or two about this...

It was my understanding that one could retrieve more documents than the 128 default setting by doing this:

session.Advanced.MaxNumberOfRequestsPerSession = int.MaxValue;

And I've learned that a WHERE clause should be an ExpressionTree instead of a Func, so that it's treated as Queryable instead of Enumerable. So I thought this should work:

public static List<T> GetObjectList<T>(Expression<Func<T, bool>> whereClause)
{
    using (IDocumentSession session = GetRavenSession())
    {
        return session.Query<T>().Where(whereClause).ToList();                
    }
}

However, that only returns 128 documents. Why?

Note, here is the code that calls the above method:

RavenDataAccessComponent.GetObjectList<Ccm>(x => x.TimeStamp > lastReadTime);

If I add Take(n), then I can get as many documents as I like. For example, this returns 200 documents:

return session.Query<T>().Where(whereClause).Take(200).ToList();

Based on all of this, it would seem that the appropriate way to retrieve thousands of documents is to set MaxNumberOfRequestsPerSession and use Take() in the query. Is that right? If not, how should it be done?

For my app, I need to retrieve thousands of documents (that have very little data in them). We keep these documents in memory and used as the data source for charts.

** EDIT **

I tried using int.MaxValue in my Take():

return session.Query<T>().Where(whereClause).Take(int.MaxValue).ToList();

And that returns 1024. Argh. How do I get more than 1024?

** EDIT 2 - Sample document showing data **

{
  "Header_ID": 3525880,
  "Sub_ID": "120403261139",
  "TimeStamp": "2012-04-05T15:14:13.9870000",
  "Equipment_ID": "PBG11A-CCM",
  "AverageAbsorber1": "284.451",
  "AverageAbsorber2": "108.442",
  "AverageAbsorber3": "886.523",
  "AverageAbsorber4": "176.773"
}
like image 948
Bob Horn Avatar asked Apr 06 '12 20:04

Bob Horn


4 Answers

It is worth noting that since version 2.5, RavenDB has an "unbounded results API" to allow streaming. The example from the docs shows how to use this:

var query = session.Query<User>("Users/ByActive").Where(x => x.Active);
using (var enumerator = session.Advanced.Stream(query))
{
    while (enumerator.MoveNext())
    {
        User activeUser = enumerator.Current.Document;
    }
}

There is support for standard RavenDB queries, Lucence queries and there is also async support.

The documentation can be found here. Ayende's introductory blog article can be found here.

like image 103
Sean Kearon Avatar answered Oct 15 '22 13:10

Sean Kearon


The Take(n) function will only give you up to 1024 by default. However, you can change this default in Raven.Server.exe.config:

<add key="Raven/MaxPageSize" value="5000"/>

For more info, see: http://ravendb.net/docs/intro/safe-by-default

like image 29
Mike Christensen Avatar answered Oct 15 '22 11:10

Mike Christensen


The Take(n) function will only give you up to 1024 by default. However, you can use it in pair with Skip(n) to get all

        var points = new List<T>();
        var nextGroupOfPoints = new List<T>();
        const int ElementTakeCount = 1024;
        int i = 0;
        int skipResults = 0;

        do
        {
            nextGroupOfPoints = session.Query<T>().Statistics(out stats).Where(whereClause).Skip(i * ElementTakeCount + skipResults).Take(ElementTakeCount).ToList();
            i++;
            skipResults += stats.SkippedResults;

            points = points.Concat(nextGroupOfPoints).ToList();
        }
        while (nextGroupOfPoints.Count == ElementTakeCount);

        return points;

RavenDB Paging

like image 16
Aleksey Cherenkov Avatar answered Oct 15 '22 12:10

Aleksey Cherenkov


Number of request per session is a separate concept then number of documents retrieved per call. Sessions are short lived and are expected to have few calls issued over them.

If you are getting more then 10 of anything from the store (even less then default 128) for human consumption then something is wrong or your problem is requiring different thinking then truck load of documents coming from the data store.

RavenDB indexing is quite sophisticated. Good article about indexing here and facets here.

If you have need to perform data aggregation, create map/reduce index which results in aggregated data e.g.:

Index:

    from post in docs.Posts
    select new { post.Author, Count = 1 }

    from result in results
    group result by result.Author into g
    select new
    {
       Author = g.Key,
       Count = g.Sum(x=>x.Count)
    }

Query:

session.Query<AuthorPostStats>("Posts/ByUser/Count")(x=>x.Author)();
like image 5
Petar Vučetin Avatar answered Oct 15 '22 11:10

Petar Vučetin