Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

For Google App Engine (java), how do I set and use chunk size in FetchOptions?

Im running a query and it is currently returning 1400 results and because of this I am getting the following warning in the log file:

com.google.appengine.api.datastore.QueryResultsSourceImpl logChunkSizeWarning: This query does not have a chunk size set in FetchOptions and has returned over 1000 results. If result sets of this size are common for this query, consider setting a chunk size to improve performance.

I can't find any examples anywhere as to how to actually implement this, there is a question on here about python, but as Im using java and dont understand python, I am struggling to translate it.

Also this query (below) is taking 17226cpu_ms to execute, which feels like way too long, I cant even imagine what would happen if I had say 5000 contacts and needed to search through them on the client side (like you do with googlemail contacts!)

The code I have is:

    int index=0;
    int numcontacts=0;
    String[][] DetailList;

    PersistenceManager pm = PMF.get().getPersistenceManager();


    try {
        Query query = pm.newQuery(Contact.class, "AdminID == AID");
        query.declareParameters("Long AID");
        query.setOrdering("Name asc");
        List<Contact> Contacts = (List<Contact>) query.execute(AdminID);
        numcontacts=Contacts.size();
        DetailList=new String[numcontacts][5];

        for (Contact contact : Contacts) 
        {
            DetailList[index][0]=contact.getID().toString();
            DetailList[index][1]=Encode.EncodeString(contact.getName());
            index++;
        }
    } finally {
        pm.close();
    }
    return (DetailList);

I found the following two entries on here:

  • google app engine chunkSize & prefetchSize - where can I read details on it?
  • GAE/J Low-level API: FetchOptions usage

but neither actually goes into any details about how to implement or use these options. Im guessing its a server side process, and Im guessing that you are meant to setup some kind of loop to grab the chunks one chunk at a time, but how do I actually do that?

  • Do I call the query inside a loop?
  • How do I know how many times to loop?
  • Do I just check for the first chunk that comes back with less than the chunk size number of entries?

How am I meant to go about finding out stuff like this without an actual example to follow? It seems to me that other people on here seem to "just know" how to do it..!

Sorry If I am not asking the questions in the right way or I'm just being a dim newbie about this, but I dont know where else to turn to figure this out!

like image 374
johnvdenley Avatar asked Aug 26 '11 00:08

johnvdenley


2 Answers

Meeting the same problem and the last comment was from a month ago, so here is what I found out about heavy dataset query.

I guess I'm gonna use the "Query cursor" technique after reading those lines in the google docs article (the one in python mentioned by the way) :

This article was written for SDK version 1.1.7. As of release 1.3.1, query cursors (Java | Python) have superseded the techniques described below and are now the recommended method for paging through large datasets.

In the google docs about "Query Cursor". The first line of the doc gives precisely why the need for cursor :

Query cursors allow an app to perform a query and retrieve a batch of results, then fetch additional results for the same query in a subsequent web request without the overhead of a query offset.

The documentation provides also a java example of a servlet using the cursor technique. There is a tip how to generate a safe cursor for the client. Finally, limitations of cursor are exposed.

Hope this gives you a lead to resolve your problem.

Small reminder about range and offset, quite impacting on performance if forgotten (and I did^^) :

The starting offset has implications for performance: the Datastore must retrieve and then discard all results prior to the starting offset. For example, a query with a range of 5, 10 fetches ten results from the Datastore, then discards the first five and returns the remaining five to the application.


Edit : As working with JDO, I kept looking for a way to allow my previous code to load more than 1000 result in a single query. So, if you're using JDO too, I found this old issue:

Query query = pm.newQuery(...);
// I would use of value below 1000 (gae limit) 
query.getFetchPlan().setFetchSize(numberOfRecordByFetch); 
like image 192
elkaonline Avatar answered Oct 21 '22 06:10

elkaonline


This is how I apply FetchOptions, compared to your example code, you might need to tweak a bit:

// ..... build the Query object
FetchOptions fetch_options =
    FetchOptions.Builder.withPrefetchSize(100).chunkSize(100);
QueryResultList<Entity> returned_entities =
    datastore_service_instance.prepare(query).asQueryResultList(fetch_options);

Of course that the figures may be changed (100).

If my answer isn't what you're looking for then you're welcome to rephrase your question (edit).

By the way I'm the one who wrote the first linked question.

like image 31
Poni Avatar answered Oct 21 '22 05:10

Poni