Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I optimize an AppEngine Java/JDO datastore put() to use less writes

I'm tuning an app we run on App Engine and one of the largest costs is data store reads and writes. I have noticed one of the biggest offenders of the writes is when we persist an order.

Basic data is Order has many items - we store both separately and relate them like this:

@PersistenceCapable
public class Order implements Serializable {

     @Persistent(mappedBy="order")
     @Element(dependent = "true")
     private List<Item> orderItems;

     // other fields too obviously
}

@PersistenceCapable
public class Item implements Serializable {

     @Persistent(dependent = "true")
     @JsonIgnore
     private Order order;

     // more fields...

}

The appstats is showing two data store puts for an order with a single item - but both are using massive numbers of writes. I want to know the best way to optimize this from anyone who's got experience.

AppStats data:

real=34ms api=1695ms cost=6400 billed_ops=[DATASTORE_WRITE:64]

real=42ms api=995ms cost=3600 billed_ops=[DATASTORE_WRITE:36]

appstats request info

Some of the areas I know of that would probably help:

  1. less indexes - there's implict indexes on a number of order and item properties that I could tell appengine not to index, for example item.quantity is not something I need to query by. But is that what all these writes are for?
  2. de-relate item and order, so that I just have a single entity OrderItem, removing the need for a relationship at all (but paying for it with extra storage).
  3. In terms of explicity indexes, I only have 1 on the order table, by order date, and one on the order items, by SKU/date and the implict one for the relationship.
  4. If the items were a collection, not a list, would that remove the need for an index on the children _IDX entirely?

So, my question would be, are any of the above items going to herald big wins, or are there other options I've missed that would be better to focus on initially?

Bonus points: Is there a good 'guide to less datastore writes' article somewhere?

like image 454
Ashley Schroder Avatar asked Nov 03 '22 08:11

Ashley Schroder


1 Answers

Billing docs clearly state:

  • New Entity Put (per entity, regardless of entity size): 2 writes + 2 writes per indexed property value + 1 write per composite index value

  • Existing Entity Put (per entity): 1 write + 4 writes per modified indexed property value + 2 writes per modified composite index value

  • Also relevant: App Engine predefines a simple index on each property of an entity.

On to questions:

  1. Yes, number of write ops is related to number of indexes properties. Make them unindexed to save write ops.
  2. Combining two entities together would save you 1 write (or 2 in case of new entities).
  3. You don't need to have "explicit" indexes for one property only. These are generated automatically by appengine. You just need to explicitly configure compound indexes, spanning more properties.
  4. No. Collection or List (= Collection with order) is just a Java representation, Datastore API always uses list internally (= items added retain their order).

Update:

Number of indexes affect cost of write but not it's speed. Writes are done in two phases: commit phase where entity data is saved, and apply phase where indexes are built. The put operation returns after commit phase and is not affected by number of indexes.

In your case you are calling two puts, one after another. As you can see from AppStats graph they happen consecutively. You might want to execute them in parallel as async operations (not sure if available in JDO).

like image 134
Peter Knego Avatar answered Nov 10 '22 01:11

Peter Knego