Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Database storage design of large amounts of heterogeneous data

Here is something I've wondered for quite some time, and have not seen a real (good) solution for yet. It's a problem I imagine many games having, and that I can't easily think of how to solve (well). Ideas are welcome, but since this is not a concrete problem, don't bother asking for more details - just make them up! (and explain what you made up).

Ok, so, many games have the concept of (inventory) items, and often, there are hundreds of different kinds of items, all with often very varying data structures - some items are very simple ("a rock"), others can have insane complexity or data behind them ("a book", "a programmed computer chip", "a container with more items"), etc.

Now, programming something like that is easy - just have everything implement an interface, or maybe extend an abstract root item. Since objects in the programming world don't have to look the same on the inside as on the outside, there is really no issue with how much and what kind of private fields any type of item has.

But when it comes to database serialization (binary serialization is of course no problem), you are facing a dilemma: how would you represent that in, say, a typical SQL database ?

Some attempts at a solution that I have seen, none of which I find satisfying:

  1. Binary serialization of the items, the database just holds an ID and a blob.

    • Pro's: takes like 10 seconds to implement.
    • Con's: Basically sacrifices every database feature, hard to maintain, near impossible to refactor.
  2. A table per item type.

    • Pro's: Clean, flexible.
    • Con's: With a wide variety come hundreds of tables, and every search for an item has to query them all since SQL doesn't have the concept of table/type 'reference'.
  3. One table with a lot of fields that aren't used by every item.

    • Pro's: takes like 10 seconds to implement, still searchable.
    • Con's: Waste of space, performance, confusing from the database to tell what fields are in use.
  4. A few tables with a few 'base profiles' for storage where similar items get thrown together and use the same fields for different data.

    • Pro's: I've got nothing.
    • Con's: Waste of space, performance, confusing from the database to tell what fields are in use.

What ideas do you have? Have you seen another design that works better or worse?

like image 667
Torque Avatar asked Mar 01 '13 18:03

Torque


People also ask

Which database runs on the heterogeneous operating system?

Oracle Gateways The core of Oracle's Gateway solutions is the Oracle database Heterogeneous Services (HS) feature. Heterogeneous Services provides transparent and generic gateway technology to connect to non-Oracle systems.

What is heterogeneous data in big data?

Heterogeneous data are any data with high variability of data types and formats. They are possibly ambiguous and low quality due to missing values, high data redundancy, and untruthfulness. It is difficult to integrate heterogeneous data to meet the business information demands.

Can SQL accommodate heterogeneous data?

Basically, yes. A "heterogeneous" query is executed on a "heterogeneous linked server". All queries to this linked server will be heterogeneous. Distributed data stored in multiple instances of SQL Server.

What is heterogeneous databases referred to?

A heterogeneous database system is an automated (or semi-automated) system for the integration of heterogeneous, disparate database management systems to present a user with a single, unified query interface.


2 Answers

It depends if you need to sort, filter, count, or analyze those attribute.

If you use EAV, then you will screw yourself nicely. Try doing reports on an EAV schema.

The best option is to use Table Inheritance:

PRODUCT
id pk
type
att1

PRODUCT_X
id pk fk PRODUCT
att2
att3

PRODUCT_Y
id pk fk PRODUCT
att4
att 5

For attributes that you don't need to search/sort/analyze, then use a blob or xml

like image 189
Neil McGuigan Avatar answered Oct 10 '22 15:10

Neil McGuigan


I have two alternatives for you:

  1. One table for the base type and supplemental tables for each “class” of specialized types.

    In this schema, properties common to all “objects” are stored in one table, so you have a unique record for every object in the game. For special types like books, containers, usable items, etc, you have another table for each unique set of properties or relationships those items need. Every special type will therefore be represented by two records: the base object record and the supplemental record in a particular special type table.

    PROS: You can use column-based features of your database like custom domains, checks, and xml processing; you can have simpler triggers on certain types; your queries differ exactly at the point of diverging concerns.

    CONS: You need two inserts for many objects.

  2. Use a “kind” enum field and a JSONB-like field for the special type data.

    This is kind of like your #1 or #3, except with some database help. Postgres added JSONB, giving you an improvement over the old EAV pattern. Other databases have a similar complex field type. In this strategy you roll your own mini schema that you stash in the JSONB field. The kind field declares what you expect to find in that JSONB field.

    PROS: You can extract special type data in your queries; can add check constraints and have a simple schema to deal with; you can benefit from indexing even though your data is heterogenous; your queries and inserts are simple.

    CONS: Your data types within JSONB-like fields are pretty limited and you have to roll your own validation.

like image 21
John Christopher Jones Avatar answered Oct 10 '22 14:10

John Christopher Jones