Is there a way to assure FIFO (first in, first out) behavior with Task Queues on GAE? GAE Documentation says that FIFO is one of the factors that affect task execution order, but the same documentation says that “the system's scheduling may 'jump' new tasks to the head of the queue” and I have confirmed this behavior with a test. The effect: my events are being processed out of order. <blockquote> Docs says: https://developers.google.com/appengine/docs/java/taskqueue/overview-push The order in which tasks are executed depends on several factors: The position of the task in the queue. App Engine attempts to process tasks based on FIFO > (first in, first out) order. In general, tasks are inserted into the end of a queue, and executed from the head of the queue. The backlog of tasks in the queue. The system attempts to deliver the lowest latency possible for any given task via specially optimized notifications to the scheduler. Thus, in the case that a queue has a large backlog of tasks, the system's scheduling may "jump" new tasks to the head of the queue. The value of the task's etaMillis property. This property specifies the earliest time that a task can execute. App Engine always waits until after the specified ETA to process push tasks. The value of the task's countdownMillis property. This property specifies the minimum number of seconds to wait before executing a task. Countdown and eta are mutually exclusive; if you specify one, do not specify the other. </blockquote> What I need to do? In my use case, I'll process 1-2 million events/day coming from vehicles. These events can be sent at any interval (1 sec, 1 minute or 1 hour). The order of the event processing has to be assured. I need process by timestamp order, which is generated on a embedded device inside the vehicle. What I have now? <ol> <li>A Rest servlet that is called by the consumer and creates a Task (Event data is on payload).</li> <li> After this, a worker servlet get this Task and: <ul> <li>Deserialize Event data;</li> <li>Put Event on Datastore;</li> <li>Update Vehicle On Datastore.</li> </ul> </li> </ol> So, again, is there any way to assure just FIFO behavior? Or how can I improve this solution to get this?

You need to approach this with three separate steps: <ol> <li>Implement a Sharding Counter to generate a monotonically increasing <code>ID</code>. As much as I like to use the <code>timestamp</code> from Google's server to indicate task ordering, it appears that timestamps between GAE servers might vary more than what your requirement is.</li> <li>Add your tasks to a <code>Pull Queue</code> instead of a <code>Push Queue</code>. When constructing your <code>TaskOption</code>, add the <code>ID</code> obtained from Step #1 as a tag. After adding the task, store the <code>ID</code> somewhere on your datastore.</li> <li>Have your worker servlet lease <code>Tasks</code> by a certain tag from the <code>Pull Queue</code>. Query the datastore to get the earliest ID that you need to fetch, and use the <code>ID</code> as the lease <code>tag</code>. In this way, you can simulate FIFO behavior for your task queue.</li> </ol> After you finished your processing, delete the <code>ID</code> from your datastore, and don't forget to delete the <code>Task</code> from your <code>Pull Queue</code> too. Also, I would recommend you run your task consumptions on the Backend. UPDATE: As noted by Nick Johnson and mjaggard, sharding in step #1 doesn't seem to be viable to generate a monotonically increasing IDs, and other sources of IDs would then be needed. I seem to recall you were using timestamps generated by your vehicles, would it be possible to use this in lieu of a monotonically increasing ID? Regardless of the way to generate the IDs, the basic idea is to use datastore's query mechanism to produce a FIFO ordering of <code>Tasks</code>, and use task <code>Tag</code> to pull specific task from the <code>TaskQueue</code>. There is a caveat, though. Due to the eventual consistency read policy on high-replication datastores, if you choose HRD as your datastore (and you should, the M/S is deprecated as of April 4th, 2012), there might be some stale data returned by the query on step #2.

Is there a way to assure FIFO (first in, first out) behavior with Task Queues on GAE?

Tags:

java

google-app-engine

task-queue

Is there a way to assure FIFO (first in, first out) behavior with Task Queues on GAE?

GAE Documentation says that FIFO is one of the factors that affect task execution order, but the same documentation says that “the system's scheduling may 'jump' new tasks to the head of the queue” and I have confirmed this behavior with a test. The effect: my events are being processed out of order.

Docs says:

https://developers.google.com/appengine/docs/java/taskqueue/overview-push

The order in which tasks are executed depends on several factors:

The position of the task in the queue. App Engine attempts to process tasks based on FIFO > (first in, first out) order. In general, tasks are inserted into the end of a queue, and executed from the head of the queue.

The backlog of tasks in the queue. The system attempts to deliver the lowest latency possible for any given task via specially optimized notifications to the scheduler. Thus, in the case that a queue has a large backlog of tasks, the system's scheduling may "jump" new tasks to the head of the queue.

The value of the task's etaMillis property. This property specifies the earliest time that a task can execute. App Engine always waits until after the specified ETA to process push tasks.

The value of the task's countdownMillis property. This property specifies the minimum number of seconds to wait before executing a task. Countdown and eta are mutually exclusive; if you specify one, do not specify the other.

What I need to do? In my use case, I'll process 1-2 million events/day coming from vehicles. These events can be sent at any interval (1 sec, 1 minute or 1 hour). The order of the event processing has to be assured. I need process by timestamp order, which is generated on a embedded device inside the vehicle.

What I have now?

A Rest servlet that is called by the consumer and creates a Task (Event data is on payload).
After this, a worker servlet get this Task and:
- Deserialize Event data;
- Put Event on Datastore;
- Update Vehicle On Datastore.

So, again, is there any way to assure just FIFO behavior? Or how can I improve this solution to get this?

314

asked Apr 02 '12 19:04

André Salvati

3 Answers

You need to approach this with three separate steps:

Implement a Sharding Counter to generate a monotonically increasing ID. As much as I like to use the timestamp from Google's server to indicate task ordering, it appears that timestamps between GAE servers might vary more than what your requirement is.
Add your tasks to a Pull Queue instead of a Push Queue. When constructing your TaskOption, add the ID obtained from Step #1 as a tag. After adding the task, store the ID somewhere on your datastore.
Have your worker servlet lease Tasks by a certain tag from the Pull Queue. Query the datastore to get the earliest ID that you need to fetch, and use the ID as the lease tag. In this way, you can simulate FIFO behavior for your task queue.

After you finished your processing, delete the ID from your datastore, and don't forget to delete the Task from your Pull Queue too. Also, I would recommend you run your task consumptions on the Backend.

UPDATE: As noted by Nick Johnson and mjaggard, sharding in step #1 doesn't seem to be viable to generate a monotonically increasing IDs, and other sources of IDs would then be needed. I seem to recall you were using timestamps generated by your vehicles, would it be possible to use this in lieu of a monotonically increasing ID?

Regardless of the way to generate the IDs, the basic idea is to use datastore's query mechanism to produce a FIFO ordering of Tasks, and use task Tag to pull specific task from the TaskQueue.

There is a caveat, though. Due to the eventual consistency read policy on high-replication datastores, if you choose HRD as your datastore (and you should, the M/S is deprecated as of April 4th, 2012), there might be some stale data returned by the query on step #2.

167

answered Oct 11 '22 19:10

Ibrahim Arief

I think the simple answer is "no", however partly in order to help improve the situation, I am using a pull queue - pulling 1000 tasks at a time and then sorting them. If timing isn't important, you could sort them and put them into the datastore and then complete a batch at a time. You've still got to work out what to do with the tasks at the beginning and ends of the batch - because they might be out of order with interleaving tasks in other batches.

answered Oct 11 '22 20:10

mjaggard

Ok. This is how I've done it.

1) Rest servlet that is called from the consumer:

    If Event sequence doesn't match Vehicle sequence (from datastore)

        Creates a task on a "wait" queue to call me again

    else

       State validation

       Creates a task on the "regular" queue (Event data is on payload).


2) A worker servlet gets the task from the "regular" queue, and so on... (same pseudo code)

This way I can pause the "regular" queue in order to do a data maintenance without losing events.

Thank you for your answers. My solution is a mix of them.

answered Oct 11 '22 20:10

André Salvati

Related questions
                            
                                Java BufferedReader check next lines of a loop before looping
                            
                                How to specify mapred configurations & java options with custom jar in CLI using Amazon's EMR?
                            
                                Repository class in DDD
                            
                                Centroid of convex polyhedron
                            
                                Cannot create JDBC driver of class '' for connect URL 'null' : Tomcat & SQL Server JDBC driver
                            
                                Is javax.imageio.ImageIO broken? It imports some images as rotated
                            
                                How do you escape HTML attribute values in Java without the Owasp Library?
                            
                                Why a "java.net.SocketException: Broken pipe" will occur?
                            
                                where does tomcat store my files?
                            
                                Kill a process based on PID in Java
                            
                                Behaviour of LinkedHashMap's keySet() and values() methods [duplicate]
                            
                                Method Chaining: How to use getThis() trick in case of multi level inheritance
                            
                                Could not find or load main class Swing Java
                            
                                Java: Possible to disable Ctrl-C in Java console application?
                            
                                How to use same connection for two queries in Spring?
                            
                                Tomcat does not flush the response buffer
                            
                                Trouble copying excel sheets with JExcel API
                            
                                Prevent duplicate entries in arraylist
                            
                                what is correct internal structure of JAR file
                            
                                Format double values using a maximum of five total digits, rounding decimal digits if necessary

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is there a way to assure FIFO (first in, first out) behavior with Task Queues on GAE?

Tags:

java

google-app-engine

task-queue

André Salvati

People also ask

3 Answers

Ibrahim Arief

mjaggard

André Salvati

Recent Activity

Donate For Us