Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Slow deserialization when fetching low level datastore Entity objects from Memcache

It turns out that retrieving low level datastore entities that are stored memcache is painfully slow. Since objectify caches entities as the low level datastore Entity type, this results in poor performance when fetching many entities from memcache using objectify.

The real question is why is it slow to deserialize the Entity type from memcache? I put together a sample project to demonstrate the disparity in retrieving Entities from memcache vs plain strings or a simple Map.

Here's the code:

https://github.com/aleemstreak/perftest or the relevant file: https://github.com/aleemstreak/perftest/blob/master/src/com/rewardly/perftest/PerftestServlet.java

Also, I deployed it so you can see how big a difference this is in production: aleemsandbox.appspot.com/perftest . Its a naive profiler but it does show a huge disparity in performance. Refresh the page a few times to see the difference. Here is some sample output:

Storing String Data Test
-------------------------
generateData: 0ms
storeData: 10ms
fetchData: 9ms


Storing Map Data Test
-------------------------
generateData: 0ms
storeData: 21ms
fetchData: 92ms


Storing Entity Data Test
-------------------------
generateData: 69ms
storeData: 24ms
fetchData: 792ms

The first section shows the time it takes to store 1000 strings in memcache and then fetch it right back. The next example does the same for 1000 Map objects and finally the last example stores and retrieves 1000 low level Entity types. You can see the huge time increase to retrieve Entity types.

Any ideas why entities could be slow to deserialize from memcache?

UPDATE 1

Following a suggestion in one of the answers, I also logged the cumulative size of the objects being stored in memcache and the results no print them out. I also added one other test case - instead of storing an Entity directly, instead, I serialize the Entity to a byte[] myself first, then store it in memcache. Here are the results:

StringBenchmark
----------------
Average Fetch Time: 40.16ms
Fetch Size: 24.41KB


MapBenchmark
----------------
Average Fetch Time: 27.36ms
Fetch Size: 102.54KB


EntityBenchmark
----------------
Average Fetch Time: 1029.88ms
Fetch Size: 463.87KB


EntityPreSerializedBenchmark
----------------
Average Fetch Time: 218.82ms
Fetch Size: 490.23KB

Whats interesting here are the last two results. Although they are roughly the same size, it takes about 1/5 the time to fetch and deserialize the byte[] manually.

The code in the github repo has been updated and the deployed sample app also has the latest code so feel free to run this test there and see the results.

like image 704
aloo Avatar asked Nov 02 '22 02:11

aloo


2 Answers

Perhaps I'm nit-picking your wording, but all of the higher level API (JDO, JPA, and Ofy) use the low-level API, so all entities are really LL API entities. So, you note that "Objectify caches entities as the low level datastore Entity type", but wouldn't all higher level datastore API's do the same (assuming they've been instructed to use such caching)? So I don't think this has anything to do with ofy.

Moving on, it seems natural that your 3rd test takes much longer then the others - the Entity type adds considerable overhead relative to the simple String and even Map types. I am a bit surprised that it takes that much longer, but taking a step back, you are fetching 1000 entities, so that is still < 1 ms per entity.

I think you should add 1 more test. There is a java API to get the size in memory of an instance (can't remember it off hand). Determine the memory size of the entities you are using in your test, and then change the string test to use an object of the same size. That way we can isolate whether this has anything to do with the Entity type itself, or is just a result of the considerably greater size of the objects being cached in the 3rd test.


Update in response to new test results... Interesting. That does seem to confirm your theory that it is the serializing done by the memcache code that is causing the slowdown. That is odd, though - wouldn't memcache simply serialize the object in a manner very similar to what you are doing?

Perhaps this post helps:
manual serialization / deserialization of AppEngine Datastore objects


The memcache interface that you are using for fetching Entity objects is:

java.lang.Object get(java.lang.Object key)

https://developers.google.com/appengine/docs/java/javadoc/com/google/appengine/api/memcache/MemcacheService#get(java.lang.Object)

So it's not taking a seralizable, so rather then serializing the way that you are, it is probably using introspection. That would explain why 'EntityPreSerializedBenchmark' is much faster then 'EntityBenchmark'.

like image 198
Tom Avatar answered Nov 09 '22 15:11

Tom


Turns out to be a bug in the appengine sdk. They are aware of the problem and fixing presumably.

like image 26
aloo Avatar answered Nov 09 '22 15:11

aloo