Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL architectural advice regarding (overcomplex?) schema

Tags:

sql

schema

I'm currently in the database design stage of developing a new section of our product. For which I need to have a "sanity check" or some advice because I do not feel overly confident about some parts of the setup.

A bit of background info

The product we are developing is a so called "marketing ROI maximization system". It handles big data and processes/enhances/enriches vast amount of information before sending it to different marketing channels. That's basicly what it does in a nutshell.

The problem domain

The system currently does not fully feature good validation of data and is being "abused" on a daily basis by "marketing" people and what we call "self-service" customers. With the new google Product Listing Ads network in mind of our CEO, I was given the task to come up with a good solution on how to handle {information/data} for use in the shopping channel of google (call it PLA; product listing ads).

This is the problem:
Our product does not offer any form of validation(read: adhering to the network specific requirements) and PLA basically completely revolves around data-integrity by the means of Categorization of items (each category has defined required/optional fields) and each field can or should be in a specific format (might even depend on linked category; I don't know yet :P).

You guess it, we're kinda screwed with the current setup. It's just not possible to enforce these kinds of "strict" product feeds. By letting our marketing people and self-service customers create and send data to PLA will mean bughunting/problem solving 99% of the time. And since it's just a small company, i'd rather look at the real problem. That means; trying to create a real validation system which can be used for PLA marketing campaigns.

What needs to be done

I have been talking to our marketing people and customers to know what the use cases are and what would be the requirements. These can be summed up in the following list items:

  • Each item in the input feed needs to be mapped "categorisized" to a google PLA category (See "links" section to see which categories can be mapped to.
  • Validation needs to be setup per field per "category"
  • Each field per item needs to allocated/mapped to a field defined in the selected category.

Additional side info

Now, I don't want to worry about things like "How are we gonna link "items" to a "category" or "fields" to "category field definitions" or anything like that. These "dynamic things" will be handled by a ECA rule system, which will be developed some other time. (why you ask? The system is handling/processing data on a scheduled basis so each operation needs to be defined and stored for later use), don't worry about implementation details for now.

Also, the concrete specific implementation is often realized by the use of dynamic attributes (for example, the attributes on a field as defined by the datatype etc). An EAV system is also not my main focus right now. (the use case given above will make more sense if you take a look at the database design).

My current design

Firstly, let me explain my SQL structure using the main entities:

  • schemas; an abstract way of defining "categories", think of a PLA category
  • fields; field definitions (in a schema)
  • datatypes; a bag of types. (mainly used for giving above fields some data integrity)
  • valueConstraints; a bag of constraint definitions (not the implementation!).

Now. It's all good and dandy so far. Here's the thing I'm kinda worried about:

valueConstraints are bound to a datatype by the means of a N:M table (datatype_valueConstraints) but almost every user generated datatype is being composed of only a subset of the available valueconstraints, it doesn't make sense to have a "Price" datatype which can have a "Email" constraint.. however, it does make sense to have a "Min" and "Max" constraint seeing a price is always a number. For clarity: datatype_valueConstraints is holding the "possible" valueConstraints per datatype.

The same problem occurs with primitiveType -> constraintValue relationships. Basicly an datatype must include a "primitiveType" (in my case a foreign key to the primitive type table). A primitive type manages the valueConstraints to select from. primitiveTypes and valueConstraints are not considered user generated, so it's fixture data for now.

Don't get it? Here's an example workflow ((partial) setup for "PLA/clothing" schema):

  • Add datatype "image", set {primitive type to TEXT}
    • Select the following ValueConstraints to use (TEXT specific)
      • "URL" (make sure it's http|https or something like that, dunno)
      • "MinLength" (make sure it's there)
      • "Regex" (allow for certain image extensions.. or something like that)
  • Add field definition "imageURL", set {datatype to "image"}
    • datatype specific configuration, i.e. filling in constraint assertion data (EAV pattern related). "MinLength" = 14, "Regex" = "*(gif|jpg|png)" etcetera.

Because the datatype was of primitive "TEXT" the user could select from only "TEXT"-concerning (and inherited due to tree like datatyping system) valueConstraints.

Once the datatype was properly setup, we can use datatype "image" for multiple fields in the schema (if we would want). For example; a "PLA/CLOTHING" schema might need an "additional image" field. This is now perfectly possible by reusing the "image" datatype with maybe a different constraint configuration.

A visual SQL table layout showing relationships (brainwave regarding above wall of text):

My DB schema: (click to enlarge)

TLDR;

See "My current design" and give me your opinion. I think it might be overly complex/not well thought out and looking for improvements. Note: i'm not a DBA, just a developer. (Also, I'm not sure if the schema design will make sense if you haven't read "The problem domain" section :P)

links

  • Product Listing Ads
  • Summary of attribute requirements

I'm really looking forward to see what you guys think. Thanks in advance!

like image 928
Sebastiaan Hilbers Avatar asked Jun 04 '13 11:06

Sebastiaan Hilbers


1 Answers

Only a matter of personal preference: im not overly fond of parent relationships inside a table if they are not really necessary. I see them for the schema table, but in this case I feel the primitive types could benefit from a more rigorous schema, deleting the BASIC type and adding the basic constraints of lenght to everyone of the primitives (no such feat, in terms of space and speed). If you really have the need to make an extra level of primitive types: do it, but add only a parent table for only one parent level and conform everything to this.

Sure: it lacks flexibility but in my experience is easier to add data conformed to this schema and not need to change the code to retrieve the conditions (and the code will be a plain query) than allow for unlimited levels of nesting primitives that you are not gonna use or worse: you are gonna abuse :)

like image 188
Zelloss Avatar answered Oct 18 '22 20:10

Zelloss