State machine transitions at specific times

Question

Simplified example:

I have a to-do. It can be future, current, or late based on what time it is.

  Time       State
  8:00 am    Future
  9:00 am    Current
  10:00 am   Late

So, in this example, the to-do is "current" from 9 am to 10 am.

Originally, I thought about adding fields for "current_at" and "late_at" and then using an instance method to return the state. I can query for all "current" todos with now > current and now < late.

In short, I'd calculate the state each time or use SQL to pull the set of states I need.

If I wanted to use a state machine, I'd have a set of states and would store that state name on the to-do. But, how would I trigger the transition between states at a specific time for each to-do?

Run a cron job every minute to pull anything in a state but past the transition time and update it
Use background processing to queue transition jobs at the appropriate times in the future, so in the above example I would have two jobs: "transition to current at 9 am" and "transition to late at 10 am" that would presumably have logic to guard against deleted todos and "don't mark late if done" and such.

Does anyone have experience with managing either of these options when trying to handle a lot of state transitions at specific times?

It feels like a state machine, I'm just not sure of the best way to manage all of these transitions.

Update after responses:

Yes, I need to query for "current" or "future" todos
Yes, I need to trigger notifications on state change ("your todo wasn't to-done")

Hence, my desire to more of a state-machine-like idea so that I can encapsulate the transitions.

Michael Deardeuff · Accepted Answer

I have designed and maintained several systems that manage huge numbers of these little state machines. (Some systems, up to 100K/day, some 100K/minute)

I have found that the more state you explicitly fiddle with, the more likely it is to break somewhere. Or to put it a different way, the more state you infer, the more robust the solution.

That being said, you must keep some state. But try to keep it as minimal as possible.

Additionally, keeping the state-machine logic in one place makes the system more robust and easier to maintain. That is, don't put your state machine logic in both code and the database. I prefer my logic in the code.

Preferred solution. (Simple pictures are best).

For your example I would have a very simple table:

task_id, current_at, current_duration, is_done, is_deleted, description...

and infer the state based on now in relation to current_at and current_duration. This works surprisingly well. Make sure you index/partition your table on current_at.

Handling logic on transition change

Things are different when you need to fire an event on the transition change.

Change your table to look like this:

task_id, current_at, current_duration, state, locked_by, locked_until, description...

Keep your index on current_at, and add one on state if you like. You are now mangling state, so things are a little more fragile due to concurrency or failure, so we'll have to shore it up a little bit using locked_by and locked_until for optimistic locking which I'll describe below.

I assume your program will fail in the middle of processing on occassion—even if only for a deployment.

You need a mechanism to transition a task from one state to another. To simplify the discussion, I'll concern myself with moving from FUTURE to CURRENT, but the logic is the same no matter the transition.

If your dataset is large enough, you constantly poll the database to discover to discover tasks requiring transition (of course, with linear or exponential back-off when there's nothing to do); otherwise you use or your favorite scheduler whether it is cron or ruby-based, or Quartz if you subscribe to Java/Scala/C#.

Select all entries that need to be moved from FUTURE to CURRENT and are not currently locked.

(updated:)

-- move from pending to current
select task_id
  from tasks
 where now >= current_at
   and (locked_until is null OR locked_until < now)
   and state == 'PENDING'
   and current_at >= (now - 3 days)         -- optimization
 limit :LIMIT                               -- optimization

Throw all these task_ids into your reliable queue. Or, if you must, just process them in your script.

When you start to work on an item, you must first lock it using our optimistic locking scheme:

update tasks
   set locked_by = :worker_id     -- unique identifier for host + process + thread
     , locked_until = now + 5 minutes -- however this looks in your SQL langage
 where task_id = :task_id         -- you can lock multiple tasks here if necessary
   and (locked_until is null OR locked_until < now) -- only if it's not locked!

Now, if you actually updated the record, you own the lock. You may now fire your special on-transition logic. (Applause. This is what makes you different from all the other task managers, right?)

When that is successful, update the task state, make sure you still use the optimistic locking:

update tasks
   set state = :new_state
     , locked_until = null -- explicitly release the lock (an optimization, really)
 where task_id = :task_id
   and locked_by = :worker_id -- make sure we still own the lock
                              -- no-one really cares if we overstep our time-bounds

Multi-thread/process optimization

Only do this when you have multiple threads or processes updating tasks in batch (such as in a cron job, or polling the database)! The problem is they'll each get the similar results from the database and will then contend to lock each row. This is inefficient both because it will slow down the database, and because you have threads basically doing nothing but slowing down the others.

So, add a limit to how many results the query returns and follow this algorithm:

results = database.tasks_to_move_to_current_state :limit => BATCH_SIZE
while !results.empty
    results.shuffle! # make sure we're not in lock step with another worker
    contention_count = 0
    results.each do |task_id|
        if database.lock_task :task_id => task_id
           on_transition_to_current task_id
        else
           contention_count += 1
        end
        break if contention_count > MAX_CONTENTION_COUNT # too much contention!
    done
    results = database.tasks_to_move_to_current_state :limit => BATCH_SIZE
end

Fiddle around with BATCH_SIZE and MAX_CONTENTION_COUNT until the program is super-fast.

Update:

The optimistic locking allows for multiple processors in parallel.

By have the lock timeout (via the locked_until field) it allows for failure while processing a transition. If the processor fails, another processor is able to pick up the task after a timeout (5 minutes in the above code). It is important, then, to a) only lock the task when you are about to work on it; and b) lock the task for how long it will take to do the task plus a generous leeway.

The locked_by field is mostly for debugging purposes, (which process/machine was this on?) It is enough to have the locked_until field if your database driver returns the number of rows updated, but only if you update one row at a time.

Capncavedan · Answer

Managing all those transitions at specific times does seem tricky. Perhaps you could use something like DelayedJob to schedule the transitions, so that a cron job every minute wouldn't be necessary, and recovering from a failure would be more automated?

Otherwise - if this is Ruby, is using Enumerable an option?

Like so (in untested pseudo-code, with simplistic methods)

ToDo class

def state
  if to_do.future?
    return "Future"
  elsif to_do.current?
    return "Current"
  elsif to_do.late?
    return "Late"
  else 
    return "must not have been important"
  end
end

def future?
    Time.now.hour <= 8
end

def current?
    Time.now.hour == 9
end

def late?
    Time.now.hour >= 10
end

def self.find_current_to_dos
    self.find(:all, :conditions => " 1=1 /* or whatever */ ").select(&:state == 'Current')
end

State machine transitions at specific times

Tags:

ruby-on-rails

queue

state-machine

wesgarrison

2 Answers

Preferred solution. (Simple pictures are best).

Handling logic on transition change

Multi-thread/process optimization

Michael Deardeuff

ToDo class

Capncavedan

Recent Activity

Donate For Us

State machine transitions at specific times

Tags:

ruby-on-rails

queue

state-machine

wesgarrison

2 Answers

Preferred solution. (Simple pictures are best).

Handling logic on transition change

Multi-thread/process optimization

Michael Deardeuff

ToDo class

Capncavedan

Related questions

Recent Activity

Donate For Us