Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing efficient audit trail of record changes in Google App Engine - design patterns

I have a quite common design problem: I need to implement a history log (audit trail) for records in Google App Engine. The history log has to be structured, i.e I cannot join all changes into some free-form text and store in string field.

I've considered the following options for the history model and, after noticing performance issues in option #1, I've chosen to implement option #3. But have stil some doubts if this solution is efficient and scalable. For instance: is there a risk that performance will degrade significantly with increased number of dynamic properties in option #3?

Do you have some deeper knowledge on the pros/cons for each option or could suggest other audit trail design patterns applicable for Google App Engine DB characteristics?

  1. Use classic SQL "master-detail" relation
    • Pros
      • simple to understand for database developers with SQL background
      • clean: direct definition for history record and its properties
      • search performance: easy searching through history (can use indices)
      • troubleshooting: easy access by administration tools (_ah/admin)
    • Cons
      • one-to-many relations are often not recommended to be implemented this way in GAE DB
      • read performance: excessive number of record read operations to show long audit trail e.g. in details pane of a big records list.
  2. Store history in a BLOB field (pickled python structures)
    • Pros
      • simple to implement and flexible
      • read performance: very efficient
    • Cons
      • query performance: cannot search using indices
      • troubleshooting: cannot inspect data by admin db viewer (_ah/admin)
      • unclean: not so easy to understand/accept for SQL developers (they consider this ugly)
  3. Store history in Expando's dynamic properties. E.g. for each field fieldName create history_fieldName_n fields (where n=<0..N> is a number of history record)
    • Pros:
      • simple: simple to implement and understand
      • troubleshooting: can read all the history properties through admin interface
      • read performance: one read operation to get the record
    • Cons:
      • search performance: cannot simply search through history records (they have different name)
      • not too clean: number of properties may be confusing at first look
  4. Store history in some set of list fields in the main record. Eg. for each fieldName create a fieldName_history list field
    • Pros:
      • clean: direct definition of history properties
      • simple: easy to understand for SQL developers
      • read performance: one read operation to get the record
    • Cons:
      • search performance: can search using indices only for records which whenever had some value and cannot search for records having combination of values at some particular time;
      • troubleshooting: inspecting lists is difficult in admin db viewer
like image 604
Alek Kowalczyk Avatar asked Dec 11 '10 14:12

Alek Kowalczyk


1 Answers

If I would have to choose I would go for option 1. The reads are as (if not more) performant for the other options. And all other options only have speed advantages under specific circumstances (small or very large sets of changes). It will also get you lots of flexibility (with more ease) like purging history after x days or query history across different model types. Make sure you create the history entities as a child of the changed entity in the same transaction to guarantee consistency. You could end up with one of these:

class HistoryEventFieldLevel(db.Model):
    # parent, you don't have to define this
    date = db.DateTime()
    model = db.StringProperty()
    property = db.StringProperty() # Name of changed property
    action = db.EnumProperty(['insert', 'update', 'delete'])
    old = db.PickleProperty() # Old value for field, empty on insert
    new = db.PickleProperty() # New value for field, empty on delete

class HistoryEventModelLevel(db.Model):
    # parent, you don't have to define this
    date = db.DateTime()
    model = db.StringProperty()
    action = db.EnumProperty(['insert', 'update', 'delete'])
    change = db.PickleProperty() # Dictionary with changed fields as keys and tuples (old value, new value) as values
like image 78
Koen Bok Avatar answered Oct 20 '22 22:10

Koen Bok