Here is something I've wondered for quite some time, and have not seen a real (good) solution for yet. It's a problem I imagine many games having, and that I can't easily think of how to solve (well). Ideas are welcome, but since this is not a concrete problem, don't bother asking for more details - just make them up! (and explain what you made up). Ok, so, many games have the concept of (inventory) items, and often, there are hundreds of different kinds of items, all with often very varying data structures - some items are very simple ("a rock"), others can have insane complexity or data behind them ("a book", "a programmed computer chip", "a container with more items"), etc. Now, programming something like that is easy - just have everything implement an interface, or maybe extend an abstract root item. Since objects in the programming world don't have to look the same on the inside as on the outside, there is really no issue with how much and what kind of private fields any type of item has. But when it comes to database serialization (binary serialization is of course no problem), you are facing a dilemma: how would you represent that in, say, a typical SQL database ? Some attempts at a solution that I have seen, none of which I find satisfying: <ol> <li> Binary serialization of the items, the database just holds an ID and a blob. <ul> <li>Pro's: takes like 10 seconds to implement.</li> <li>Con's: Basically sacrifices every database feature, hard to maintain, near impossible to refactor.</li> </ul> </li> <li> A table per item type. <ul> <li>Pro's: Clean, flexible.</li> <li>Con's: With a wide variety come hundreds of tables, and every search for an item has to query them all since SQL doesn't have the concept of table/type 'reference'.</li> </ul> </li> <li> One table with a lot of fields that aren't used by every item. <ul> <li>Pro's: takes like 10 seconds to implement, still searchable.</li> <li>Con's: Waste of space, performance, confusing from the database to tell what fields are in use.</li> </ul> </li> <li> A few tables with a few 'base profiles' for storage where similar items get thrown together and use the same fields for different data. <ul> <li>Pro's: I've got nothing.</li> <li>Con's: Waste of space, performance, confusing from the database to tell what fields are in use.</li> </ul> </li> </ol> What ideas do you have? Have you seen another design that works better or worse?

It depends if you need to sort, filter, count, or analyze those attribute. If you use EAV, then you will screw yourself nicely. Try doing reports on an EAV schema. The best option is to use Table Inheritance: <pre class="prettyprint"><code>PRODUCT id pk type att1 PRODUCT_X id pk fk PRODUCT att2 att3 PRODUCT_Y id pk fk PRODUCT att4 att 5 </code></pre> For attributes that you don't need to search/sort/analyze, then use a blob or xml

I have two alternatives for you: <ol start="5"> <li> One table for the base type and supplemental tables for each “class” of specialized types. In this schema, properties common to all “objects” are stored in one table, so you have a unique record for every object in the game. For special types like books, containers, usable items, etc, you have another table for each unique set of properties or relationships those items need. Every special type will therefore be represented by two records: the base object record and the supplemental record in a particular special type table. PROS: You can use column-based features of your database like custom domains, checks, and xml processing; you can have simpler triggers on certain types; your queries differ exactly at the point of diverging concerns. CONS: You need two inserts for many objects. </li> <li> Use a “kind” enum field and a JSONB-like field for the special type data. This is kind of like your #1 or #3, except with some database help. Postgres added JSONB, giving you an improvement over the old EAV pattern. Other databases have a similar complex field type. In this strategy you roll your own mini schema that you stash in the JSONB field. The kind field declares what you expect to find in that JSONB field. PROS: You can extract special type data in your queries; can add check constraints and have a simple schema to deal with; you can benefit from indexing even though your data is heterogenous; your queries and inserts are simple. CONS: Your data types within JSONB-like fields are pretty limited and you have to roll your own validation. </li> </ol>

Database storage design of large amounts of heterogeneous data

Tags:

database

data-structures

database-design

Here is something I've wondered for quite some time, and have not seen a real (good) solution for yet. It's a problem I imagine many games having, and that I can't easily think of how to solve (well). Ideas are welcome, but since this is not a concrete problem, don't bother asking for more details - just make them up! (and explain what you made up).

Ok, so, many games have the concept of (inventory) items, and often, there are hundreds of different kinds of items, all with often very varying data structures - some items are very simple ("a rock"), others can have insane complexity or data behind them ("a book", "a programmed computer chip", "a container with more items"), etc.

Now, programming something like that is easy - just have everything implement an interface, or maybe extend an abstract root item. Since objects in the programming world don't have to look the same on the inside as on the outside, there is really no issue with how much and what kind of private fields any type of item has.

But when it comes to database serialization (binary serialization is of course no problem), you are facing a dilemma: how would you represent that in, say, a typical SQL database ?

Some attempts at a solution that I have seen, none of which I find satisfying:

Binary serialization of the items, the database just holds an ID and a blob.
- Pro's: takes like 10 seconds to implement.
- Con's: Basically sacrifices every database feature, hard to maintain, near impossible to refactor.
A table per item type.
- Pro's: Clean, flexible.
- Con's: With a wide variety come hundreds of tables, and every search for an item has to query them all since SQL doesn't have the concept of table/type 'reference'.
One table with a lot of fields that aren't used by every item.
- Pro's: takes like 10 seconds to implement, still searchable.
- Con's: Waste of space, performance, confusing from the database to tell what fields are in use.
A few tables with a few 'base profiles' for storage where similar items get thrown together and use the same fields for different data.
- Pro's: I've got nothing.
- Con's: Waste of space, performance, confusing from the database to tell what fields are in use.

What ideas do you have? Have you seen another design that works better or worse?

667

asked Mar 01 '13 18:03

Torque

2 Answers

It depends if you need to sort, filter, count, or analyze those attribute.

If you use EAV, then you will screw yourself nicely. Try doing reports on an EAV schema.

The best option is to use Table Inheritance:

PRODUCT
id pk
type
att1

PRODUCT_X
id pk fk PRODUCT
att2
att3

PRODUCT_Y
id pk fk PRODUCT
att4
att 5

For attributes that you don't need to search/sort/analyze, then use a blob or xml

189

answered Oct 10 '22 15:10

Neil McGuigan

I have two alternatives for you:

One table for the base type and supplemental tables for each “class” of specialized types.

In this schema, properties common to all “objects” are stored in one table, so you have a unique record for every object in the game. For special types like books, containers, usable items, etc, you have another table for each unique set of properties or relationships those items need. Every special type will therefore be represented by two records: the base object record and the supplemental record in a particular special type table.

PROS: You can use column-based features of your database like custom domains, checks, and xml processing; you can have simpler triggers on certain types; your queries differ exactly at the point of diverging concerns.

CONS: You need two inserts for many objects.
Use a “kind” enum field and a JSONB-like field for the special type data.

This is kind of like your #1 or #3, except with some database help. Postgres added JSONB, giving you an improvement over the old EAV pattern. Other databases have a similar complex field type. In this strategy you roll your own mini schema that you stash in the JSONB field. The kind field declares what you expect to find in that JSONB field.

PROS: You can extract special type data in your queries; can add check constraints and have a simple schema to deal with; you can benefit from indexing even though your data is heterogenous; your queries and inserts are simple.

CONS: Your data types within JSONB-like fields are pretty limited and you have to roll your own validation.

answered Oct 10 '22 14:10

John Christopher Jones

Related questions
                            
                                How to map hashfunction output to bloomfilter indices?
                            
                                What database for crawler/scraper?
                            
                                What are com.sec.android.provider.* apps exactly?
                            
                                Mongodb find inside sub array
                            
                                Postgres Materialize causes poor performance in delete query
                            
                                SQLite connection not appearing in Entity Data Model Wizard
                            
                                How to execute an .sql file in pymssql
                            
                                Optimize large IN condition for Redshift query
                            
                                Merging Complicated Tables
                            
                                Need advice in data model
                            
                                Firebase - Targeting Specific Firestore Document Field with Cloud Functions
                            
                                Do Django Model Managers require using=self._db
                            
                                Performance of large EAV/open schema systems on SQL Server
                            
                                Secure JDBC connection
                            
                                Hibernate mapping a second @Embeddable field in a subclass
                            
                                MySQL Workbench symbol reference
                            
                                Tips for writing a DBMS
                            
                                State list of world with country-code [closed]
                            
                                Data normalization and writing queries
                            
                                Simple "SELECT" with variable but without "INTO"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With