Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to design a generic database whose layout may change over time?

Here's a tricky one - how do I programatically create and interrogate a database whose contents I can't really foresee?

I am implementing a generic input form system. The user can create PHP forms with a WYSIWYG layout and use them for any purpose he wishes. He can also query the input.

So, we have three stages:

  1. a form is designed and generated. This is a one-off procedure, although the form can be edited later. This designs the database.
  2. someone or several people make use of the form - say for daily sales reports, stock keeping, payroll, etc. Their input to the forms is written to the database.
  3. others, maybe management, can query the database and generate reports.

Since these forms are generic, I can't predict the database structure - other than to say that it will reflect HTML form fields and consist of a the data input from collection of edit boxes, memos, radio buttons and the like.

Questions and remarks:

A) how can I best structure the database, in terms of tables and columns? What about primary keys? My first thought was to use the control name to identify each column, then I realized that the user can edit the form and rename, so that maybe "name" becomes "employee" or "wages" becomes ":salary". I am leaning towards a unique number for each.

B) how best to key the rows? I was thinking of a timestamp to allow me to query and a column for the row Id from A)

C) I have to handle column rename/insert/delete. Foe deletion, I am unsure whether to delete the data from the database. Even if the user is not inputting it from the form any more he may wish to query what was previously entered. Or there may be some legal requirements to retain the data. Any gotchas in column rename/insert/delete?

D) For the querying, I can have my PHP interrogate the database to get column names and generate a form with a list where each entry has a database column name, a checkbox to say if it should be used in the query and, based on column type, some selection criteria. That ought to be enough to build searches like "position = 'senior salesman' and salary > 50k".

E) I probably have to generate some fancy charts - graphs, histograms, pie charts, etc for query results of numerical data over time. I need to find some good FOSS PHP for this.

F) What else have I forgotten?

This all seems very tricky to me, but I am database n00b - maybe it is simple to you gurus?


Edit: please don't tell me not to do it. I don't have any choice :-(

Edit: in real life I don't expect column rename/insert/delete to be frequent. However it is possible that after running for a few months a change to the database might be required. I am sure this happens regularly. I fear that I have worded this question badly and that people think that changes will be made willy-nilly every 10 minutes or so.

Realistically, my users will define a database when they lay out the form. They might get it right first time and never change it - especially if they are converting from paper forms. Even if they do decide to change, this might only happen once or twice ever, after months or years - and that can happen in any database.

I don't think that I have a special case here, nor that we should be concentrating on change. Perhaps better to concentrate on linkage - what's a good primary key scheme? Say, perhaps, for one text input, one numerical and a memo?

like image 266
Mawg says reinstate Monica Avatar asked Jun 10 '10 02:06

Mawg says reinstate Monica


2 Answers

"This all seems very tricky to me, but I am database n00b - maybe it is simple to you gurus?"

Nope, it really is tricky. Fundamentally what you're describing is not a database application, it is a database application builder. In fact, it sounds as if you want to code something like Google App Engine or a web version of MS Access. Writing such a tool will take a lot of time and expertise.

Google has implemented flexible schemas by using its BigTable platform. It allows you to flex the schema pretty much at will. The catch is, this flexibility makes it very hard to write queries like "position = 'senior salesman' and salary > 50k".

So I don't think the NoSQL approach is what you need. You want to build an application which generates and maintains RDBMS schemas. This means you need to design a metadata repository from which you can generate dynamic SQL to build and change the users' schemas and also generate the front end.

Things your metadata schema needs to store

For schema generation:

  • foreign key relationships (an EMPLOYEE works in a DEPARTMENT)
  • unique business keys (there can be only one DEPARTMENT called "Sales")
  • reference data (permitted values of EMPLOYEE.POSITION)
  • column data type, size, etc
  • whether column is optional (i.e NULL or NOT NULL)
  • complex business rules (employee bonuses cannot exceed 15% of their salary)
  • default value for columns

For front-end generation

  • display names or labels ("Wages", "Salary")
  • widget (drop down list, pop-up calendar)
  • hidden fields
  • derived fields
  • help text, tips
  • client-side validation (associated JavaScript, etc)

That last points to the potential complexity in your proposal: a regular form designer like Joe Soap is not going to be able to formulate the JS to (say) validate that an input value is between X and Y, so you're going to have to derive it using templated rules.

These are by no means exhaustive lists, it's just off the top of my head.

For primary keys I suggest you use a column of GUID datatype. Timestamps aren't guaranteed to be unique, although if you run your database on an OS which goes to six places (i.e. not Windows) it's unlikely you'll get clashes.

last word

'My first thought was to use the control name to identify each column, then I realized that the user can edit the form and rename, so that maybe "name" becomes "employee" or "wages" becomes ":salary". I am leaning towards a unique number for each.'

I have built database schema generators before. They are hard going. One thing which can be tough is debugging the dynamic SQL. So make it easier on yourself: use real names for tables and columns. Just because the app user now wants to see a form titled HEADCOUNT it doesn't mean you have to rename the EMPLOYEES table. Hence the need to separate the displayed label from the schema object name. Otherwise you'll find yourself trying to figure out why this generated SQL statement failed:

update table_11123
set col_55542 = 'HERRING'
where col_55569 = 'Bootle'
/

That way madness lies.

like image 163
APC Avatar answered Sep 24 '22 19:09

APC


In essence, you are asking how to build an application without specifications. Relational databases were not designed so that you can do this effectively. The common approach to this problem is an Entity-Attribute-Value design and for the type of system in which you want to use it, the odds of failure are nearly 100%.

It makes no sense for example, that the column called "Name" could become "Salary". How would a report where you want the total salary work if the salary values could have "Fred", "Bob", 100K, 1000, "a lot"? Databases were not designed to let anyone put anything anywhere. Successful database schemas require structure which means effort with respect to specifications on what needs to be stored and why.

Therefore, to answer your question, I would rethink the problem. The entire approach of trying to make an app that can store anything in the universe is not a recipe for success.

like image 28
Thomas Avatar answered Sep 21 '22 19:09

Thomas