Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google App-Engine Datastore is extremely slow

I need help in understanding why the below code is taking 3 to 4 seconds.

UPDATE: Use case for my application is to get the activity feed of a person since last login. This feed could contain updates from friends or some new items outside of his network that he may find interesting. The Activity table stores all such activities and when a user logs in, I run a query on the GAE-DataStore to return above activities. My application supports infinite scrolling too, hence I need the cursor feature of GAE. At a given time, I get around 32 items but the activities table could have millions of rows (as it contains data from all the users).

Currently the Activity table is small and contains 25 records only and the below java code reads only 3 records from the same table.

Each record in the Activity table has 4 UUID fields.

I cannot imagine how the query would behave if the table contained millions of rows and result contained 100s of rows.

Is there something wrong with the below code I have below?

(I am using Objectify and app-engine cursors)

Filter filter = new FilterPredicate("creatorID", FilterOperator.EQUAL, userId);
Query<Activity> query = ofy().load().type(Activity.class).filter(filter);
query = query.startAt(Cursor.fromWebSafeString(previousCursorString));
QueryResultIterator<Activity> itr = query.iterator();
while (itr.hasNext())
{
    Activity a = itr.next();
    System.out.println (a);
}

I have gone through Google App Engine Application Extremely slow and verified that response time improves if I keep on refreshing my page (which calls the above code). However, the improvement is only ~30%

Compare this with any other database and the response time for such tiny data is in milliseconds, not even 100s of milliseconds.

Am I wrong in expecting a regular database kind of performance from the GAE DataStore?

I do not want to turn on memcache just yet as I want to improve this layer without caching first.

like image 480
user2250246 Avatar asked May 04 '15 05:05

user2250246


1 Answers

Not exactly sure what your query is supposed to do but it doesn't look like it requires a cursor query. In my humble opinion the only valid use case for cursor queries is a paginated query for data with a limited count of result rows. Since your query does not have a limit i don't see why you would want to use a cursor at all.

When you need millions of results you're probably doing ad-hoc analysis of data (as no human could ever interpret millions of raw data rows) you might be better off using BigQuery instead of the appengine datastore. I'm just guessing here, but for normal front end apps you rarely need millions of rows in a result but only a few (maybe hundreds at times) which you filter from the total available rows.

Another thing:

Are you sure that it is the query that takes long? It might as well be the wrapper around the query. Since you are using cursors you would have to recall the query until there are no more results. The handling of this could be costly.

Lastly:

Are you testing on appengine itself or on the local development server? The devserver can obviouily not simulate a cloud and thus could be slower (or faster) than the real thing at times. The devserver does not know about instance warmup times either when your query spawns new instances.

Speaking of cloud: The thing about cloud databases is not that they have the best performance for very little data but that they scale and perform consistently with a couple of hundreds and a couple of billions of rows.

Edit:

After performing a retrieval operation, the application can obtain a cursor, which is an opaque base64-encoded string marking the index position of the last result retrieved.

[...]

The cursor's position is defined as the location in the result list after the last result returned. A cursor is not a relative position in the list (it's not an offset); it's a marker to which the Datastore can jump when starting an index scan for results. If the results for a query change between uses of a cursor, the query notices only changes that occur in results after the cursor. If a new result appears before the cursor's position for the query, it will not be returned when the results after the cursor are fetched. (Datastore Queries)

These two statements make be believe that the query performance should be consistent with or without cursor queries.

Here are some more things you might want to check:

  • How do you register your entity classes with objectify?
  • What does your actual test code look like? I'd like to see how and where you measure.
  • Can you share a comparison between cursor query and query without cursors?
  • Improvement with multiple request could be the result of Objectifys integrated caching. You might want to disable caching for datastore performance tests
like image 146
konqi Avatar answered Sep 21 '22 05:09

konqi