Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Separating code logic from the actual data structures. Best practices? [closed]

Tags:

oop

I have an application that loads lots of data into memory (this is because it needs to perform some mathematical simulation on big data sets). This data comes from several database tables, that all refer to each other.

The consistency rules on the data are rather complex, and looking up all the relevant data requires quite some hashes and other additional data structures on the data.

Problem is that this data may also be changed interactively by the user in a dialog. When the user presses the OK button, I want to perform all the checks to see that he didn't introduce inconsistencies in the data. In practice all the data needs to be checked at once, so I cannot update my data set incrementally and perform the checks one by one.

However, all the checking code work on the actual data set loaded in memory, and use the hashing and other data structures. This means I have to do the following:

  • Take the user's changes from the dialog
  • Apply them to the big data set
  • Perform the checks on the big data set
  • Undo all the changes if the checks fail

I don't like this solution since other threads are also continuously using the data set, and I don't want to halt them while performing the checks. Also, the undo means that the old situation needs to be put aside, which is also not possible.

An alternative is to separate the checking code from the data set (and let it work on explicitly given data, e.g. coming from the dialog) but this means that the checking code cannot use hashing and other additional data structures, because they only work on the big data set, making the checks much slower.

What is a good practice to check user's changes on complex data before applying them to the 'application's' data set?

like image 369
Patrick Avatar asked May 18 '10 08:05

Patrick


1 Answers

This is probably not much help now, since your app is built, and you probably don't want to reimplement, but I'll mention it for reference.

Using a ORM framework would help you here. Not only does it handle getting the data from the database into an object oriented representation, it also provides the tools to implement isolated temporary changes and views:

  • Using the ORM framework with transactions, you can allow the user to change the objects in the model without affecting other users, and without commiting the data "for real" until it has been checked. The ACID guarantees of transactions ensures that your changes are not persisted to the database, but held in your transaction, only visible to you. You can then run checks on the data and commit the transaction only if the data validates. If the data doesn't validate, you rollback the transaction and discard the changes. If it does validate, you commit the transaction and changes are made permanent.

  • Alternatively, you can create views which provide your data for validation. The views combine the base data and temporary tables (local to your current connection). This avoids locking tables, at the expense of having to write and maintain the views.

EDIT: If you already have a rich object model in memory, the hardest part to making that support incremental, local and isolated changes is direct references between objects. When you want to replace object A with A', that contains a change, you don't want to do a deep copy, with all referneces, since you mention that your object model is large. Also, you don't want to have to update all objects that were pointing to A to reference A'. As an example, consider a very large doubly linked list. It's not possible to create a new list that is the same as the old one with just one element changed, without duplicating the entire list. You can achieve isolation by storing the identifier for related objects rather than the object themselves. E.g. Instead of referencing A explicitly, your collaborators store a reference to the unique key that identifies A, key(A). This key is used to fetch the actual object at the time it is needed (e.g. during verification.) Your model then becomes a large Map of keys to objects, which can be decorated for local changes. When looking up an object by key, first check the local map for value, and if not found, check the universal map. To change A to A', you add an entry to the local map, that maps key(A) to A'. (Note that A and A' have the same key, since logically they are the same item.) When you run your veriification code, local changes are then incorporated, since objects referring to key(A) will get A', while other users using key(A) will get the original, A.

This may sound complex written down, but by removing explicit references and computing them on demand is the only way of supporting isolated updates without having to do a deep copy of the data.

An alternative, but equivalent way, is that your validator uses a map to lookup objects with their replacements before it uses them. E.g. your user modifies A, so you put A->A' into the map. The validator is iterating over the model and comes across A. Before using A, it checks the map, and finds A', which it then uses. The difficulty of this approach is that you have to make sure you check the map every time before an object is used. If you miss one, then your view on the model will be inconsistent.

like image 149
mdma Avatar answered Oct 13 '22 19:10

mdma