Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Inconsistent Fetch From Google App Engine Datastore

I have an application deployed in Google app engine. I am getting inconsistent data when i fetch an entity by id immediately after updating that entity. I'm using JDO 3.0 to access the app engine datastore.

I have an entity Employee

@PersistenceCapable(detachable = "true")
public class Employee implements Serializable {

    /**
     * 
     */
    private static final long serialVersionUID = -8319851654750418424L;
    @PrimaryKey
    @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY, defaultFetchGroup = "true")
    @Extension(vendorName = "datanucleus", key = "gae.encoded-pk", value = "true")
    private String id;
    @Persistent(defaultFetchGroup = "true")
    private String name;
    @Persistent(defaultFetchGroup = "true")
    private String designation;    
    @Persistent(defaultFetchGroup = "true")
    private Date dateOfJoin;    
    @Persistent(defaultFetchGroup = "true")
    private String email;
    @Persistent(defaultFetchGroup = "true")
    private Integer age;
    @Persistent(defaultFetchGroup = "true")
    private Double salary;
    @Persistent(defaultFetchGroup = "true")
    private HashMap<String, String> experience;
    @Persistent(defaultFetchGroup = "true")
    private List<Address> address;


    /**
      * Setters and getters, toString() * */

}

Initially, when I create an employee I do not set the fields salary and email.

I update the Employee entity to add salary and email later. The update works fine and the data gets persisted into the appengine datastore. However, when i immediately try to fetch the same employee entity by id, I sometimes get the stale data, where salary and email are null. The code I use to create and to fetch the employee entity is given below.

    public Employee create(Employee object) {
        Employee persObj = null;
        PersistenceManager pm = PMF.get().getPersistenceManager();
        Transaction tx = null;
        try {
            tx = pm.currentTransaction();
            tx.begin();

            persObj = pm.makePersistent(object);

            tx.commit();
        } finally {

            if ((tx != null) && tx.isActive()) {
                tx.rollback();
            }

            pm.close();
        }

        return persObj;
    }


    public Employee findById(Serializable id) {

        PersistenceManager pm = PMF.get().getPersistenceManager();

        try {
            Employee e = pm.getObjectById(Employee.class, id);

            System.out.println("INSIDE EMPLOYEE DAO : " + e.toString());
            return e;

        } finally {

            pm.close();

        }
    }


    public void update(Employee object) {
        PersistenceManager pm = PMF.get().getPersistenceManager();
        Transaction tx = null;
        try {
            tx = pm.currentTransaction();
            tx.begin();
            Employee e = pm.getObjectById(object.getClass(), object.getId());
            e.setName(object.getName());
            e.setDesignation(object.getDesignation());
            e.setDateOfJoin(object.getDateOfJoin());
            e.setEmail(object.getEmail());
            e.setAge(object.getAge());
        e.setSalary(object.getSalary());
            tx.commit();
        } finally {
            if (tx != null && tx.isActive()) {
                tx.rollback();
            }

            pm.close();
        }
    }

I have set the number of idle instances to 5 and there are around 8 instances running at a time. When I checked the logs of various instances this is what I found. enter image description here

Why do i get stale data when the request is served by certain instances. I can assure that, if the fetch request is handled by the instance which initially handled the update request I always get the updated data. But when other instances handle the fetch request stale data may be returned. I have explicitly set the datastore read consistency to strong in my jdoconfig.xml.

<?xml version="1.0" encoding="utf-8"?>
<jdoconfig xmlns="http://java.sun.com/xml/ns/jdo/jdoconfig_3_0.xsd"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://java.sun.com/xml/ns/jdo/jdoconfig http://java.sun.com/xml/ns/jdo/jdoconfig_3_0.xsd">

   <persistence-manager-factory name="transactions-optional">
       <property name="javax.jdo.PersistenceManagerFactoryClass"
           value="org.datanucleus.api.jdo.JDOPersistenceManagerFactory"/>
       <property name="javax.jdo.option.ConnectionURL" value="appengine"/>
       <property name="javax.jdo.option.NontransactionalRead" value="true"/>
       <property name="javax.jdo.option.NontransactionalWrite" value="true"/>
       <property name="javax.jdo.option.RetainValues" value="true"/>
       <property name="datanucleus.appengine.autoCreateDatastoreTxns" value="true"/>
       <property name="datanucleus.appengine.singletonPMFForName" value="true"/>
       <property name="datanucleus.appengine.datastoreEnableXGTransactions" value="true"/>
       <property name="datanucleus.query.jdoql.allowAll" value="true"/>      
       <property name="datanucleus.appengine.datastoreReadConsistency" value="STRONG" />

   </persistence-manager-factory>
</jdoconfig>
like image 698
HariShankar Avatar asked Oct 15 '14 07:10

HariShankar


2 Answers

If you are using the the High Replication datastore, setting the read policy does not ensure that all reads are strongly consistent, those only work for ancestor queries. From the documentation;

The API also allows you to explicitly set a strong consistency policy, but this setting will have no practical effect, since non-ancestor queries are always eventually consistent regardless of policy.

https://cloud.google.com/appengine/docs/java/datastore/queries#Java_Data_consistency https://cloud.google.com/appengine/docs/java/datastore/jdo/overview-dn2#Setting_the_Datastore_Read_Policy_and_Call_Deadline

Please have a look at the document about Structuring Data for Strong Consistency, the preferred approach is to the caching layer to serve the data.

I noticed that you are using get by ID, not sure, but "get by key" is supposed to be strongly consistent even for HR datastore (reference), can you try changing this to query based on the key? Key is built using the id and the entity kind and ancestry.

like image 158
asp Avatar answered Sep 18 '22 15:09

asp


I have a suggestion, however you're not gonna like that: use low level API exclusively and forget about JDO / JPA when using GAE.

Just like @asp said, get by ID is supposed to be strongly consistent, however GAE JDO plugin seems bugged to me. Unfortunately, migrating to JPA was no help in my case as well (more here: JDO transactions + many GAE instances = overriding data). Also, if I annotate any class as @PersistenceAware, Eclipse goes crazy, and enhances the classes in infinite loop. Also, I had a lot of problems when using @PersistenceCapable class with embedded class and caching (without caching it worked fine).

Well, the point is, I think it will be way faster with low level API - you know exactly what is happening and it seems to work as intended. You can treat Entity like a Map, with a little bit of self-written wrapping code it seems like a quite interesting alternative. I run some tests and with low level API I passed them no problem, while passing it with JDO/JPA was not possible. I am in the middle of migrating my whole application from JDO to low level API. It is time-consuming, but less than waiting indefinitely for some magical solution or bugfix from GAE team.

Also, while writting GAE JDO I felt... alone. If you have a problem with java, or even android, a thousand of other people already had this problem, asked about it on stackoverflow and got tons of valid solutions. Here you are all by yourself, so use as low level API as possible and you'll be sure whats happening. Even though migration seems scary as hell and time-consuming, I think you'll waste less time migrating to low level API than dealing with GAE JDO/JPA. I don't write it to pinch the team that develops GAE JDO/JPA or to offend them, I'm sure they do their best. But:

  1. There is not so many people using GAE comparing to, lets say, Android or Java in general

  2. Using GAE JDO/JPA with multiple server instances is not that simple and straightforward as you would think. The developer like me wants to have his job done ASAP, see some example, read a bit of documentation - not to study it all in detail, read a short tutorial and the developer has a problem, he would like to share it on stackoverflow and get quick help. Its easy to get help if you do something wrong on Android, no matter if its complicated or its easy mistake. Its not that easy with GAE JDO/JPA. I've spent much more time on GAE JDO articles, tutorials and documentation than I would like to, and I failed to do what I wanted even though it seemed pretty basic. If I just used low level API and not tried to take a shortcut with JDO (yeah, I thought JDO will save my time), it would be much, much quicker.

  3. Google is focused on Python GAE much more than Java. In many articles that are targeted for all the audiences, there is Python code and hints exclusively, quick examples here: http://googlecloudplatform.blogspot.com/2013/12/best-practices-for-app-engine-memcache.html or here: https://cloud.google.com/developers/articles/balancing-strong-and-eventual-consistency-with-google-cloud-datastore/ . I've noticed that even before starting development, but I wanted to share some code with my Android client, so I chose Java. Even though I have solid Java background and even that I do share some code now, if I could go back in time and choose again, I'd choose Python now.

Thats why I think its best to use only the most basic methods to access and manipulate data.

Good luck, I wish you all the best.

like image 38
user2855896 Avatar answered Sep 20 '22 15:09

user2855896