I'm currently in the database design stage of developing a new section of our product. For which I need to have a "sanity check" or some advice because I do not feel overly confident about some parts of the setup.
The product we are developing is a so called "marketing ROI maximization system". It handles big data and processes/enhances/enriches vast amount of information before sending it to different marketing channels. That's basicly what it does in a nutshell.
The system currently does not fully feature good validation of data and is being "abused" on a daily basis by "marketing" people and what we call "self-service" customers. With the new google Product Listing Ads network in mind of our CEO, I was given the task to come up with a good solution on how to handle {information/data} for use in the shopping channel of google (call it PLA; product listing ads).
This is the problem:
Our product does not offer any form of validation(read: adhering to the network specific requirements) and PLA basically completely revolves around data-integrity by the means of Categorization of items (each category has defined required/optional fields) and each field can or should be in a specific format (might even depend on linked category; I don't know yet :P).
You guess it, we're kinda screwed with the current setup. It's just not possible to enforce these kinds of "strict" product feeds. By letting our marketing people and self-service customers create and send data to PLA will mean bughunting/problem solving 99% of the time. And since it's just a small company, i'd rather look at the real problem. That means; trying to create a real validation system which can be used for PLA marketing campaigns.
I have been talking to our marketing people and customers to know what the use cases are and what would be the requirements. These can be summed up in the following list items:
Now, I don't want to worry about things like "How are we gonna link "items" to a "category" or "fields" to "category field definitions" or anything like that. These "dynamic things" will be handled by a ECA rule system, which will be developed some other time. (why you ask? The system is handling/processing data on a scheduled basis so each operation needs to be defined and stored for later use), don't worry about implementation details for now.
Also, the concrete specific implementation is often realized by the use of dynamic attributes (for example, the attributes on a field as defined by the datatype etc). An EAV system is also not my main focus right now. (the use case given above will make more sense if you take a look at the database design).
Firstly, let me explain my SQL structure using the main entities:
schemas
; an abstract way of defining "categories", think of a PLA categoryfields
; field definitions (in a schema
)datatypes
; a bag of types. (mainly used for giving above fields some data integrity)valueConstraints
; a bag of constraint definitions (not the implementation!). Now. It's all good and dandy so far. Here's the thing I'm kinda worried about:
valueConstraints
are bound to a datatype by the means of a N:M table (datatype_valueConstraints
) but almost every user generated datatype is being composed of only a subset of the available valueconstraints, it doesn't make sense to have a "Price" datatype which can have a "Email" constraint.. however, it does make sense to have a "Min" and "Max" constraint seeing a price is always a number. For clarity: datatype_valueConstraints
is holding the "possible" valueConstraints
per datatype.
The same problem occurs with primitiveType -> constraintValue relationships. Basicly an datatype must include a "primitiveType" (in my case a foreign key to the primitive type table). A primitive type manages the valueConstraints
to select from. primitiveTypes
and valueConstraints
are not considered user generated, so it's fixture data for now.
Don't get it? Here's an example workflow ((partial) setup for "PLA/clothing" schema):
ValueConstraints
to use (TEXT specific)
Because the datatype was of primitive "TEXT" the user could select from only "TEXT"-concerning (and inherited due to tree like datatyping system) valueConstraints
.
Once the datatype was properly setup, we can use datatype "image" for multiple fields in the schema (if we would want). For example; a "PLA/CLOTHING" schema might need an "additional image" field. This is now perfectly possible by reusing the "image" datatype with maybe a different constraint configuration.
A visual SQL table layout showing relationships (brainwave regarding above wall of text):
My DB schema: (click to enlarge)
See "My current design" and give me your opinion. I think it might be overly complex/not well thought out and looking for improvements. Note: i'm not a DBA, just a developer. (Also, I'm not sure if the schema design will make sense if you haven't read "The problem domain" section :P)
I'm really looking forward to see what you guys think. Thanks in advance!
Only a matter of personal preference: im not overly fond of parent relationships inside a table if they are not really necessary. I see them for the schema table, but in this case I feel the primitive types could benefit from a more rigorous schema, deleting the BASIC type and adding the basic constraints of lenght to everyone of the primitives (no such feat, in terms of space and speed). If you really have the need to make an extra level of primitive types: do it, but add only a parent table for only one parent level and conform everything to this.
Sure: it lacks flexibility but in my experience is easier to add data conformed to this schema and not need to change the code to retrieve the conditions (and the code will be a plain query) than allow for unlimited levels of nesting primitives that you are not gonna use or worse: you are gonna abuse :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With