Which is the best practice to store a huge number (10000+) of DIFFERENT object types into a database?

Tags:

When designing a new relational database, normally each object type is represented by a corresponding table. Which is the best practice to design a database, which stores a huge number of DIFFERENT object types in order to avoid to create and maintain thousands of database tables? Which better alternatives to a relational database exist for this case?

687

asked Feb 18 '17 08:02

Anne Droid

1 Answers

The answer depends much on the nature of the distinctions between the thousands of object types and to what degree and in what ways they can be classified and possibly generalized further. Discovery is the key to a maintainable design in scenarios such as this.

Here are some potential persistence options that may work for your set of object types. It will take some thought to consider the pros and cons of each.

Discover a hidden structure or pattern in the object types allowing them to be decomposed ^1,2,3.
Discover categories of object types to which (1) can then be applied.
Map multiple objects to a single or smaller set of tables or document types.
Map the objects one to one and determine a meta scheme to maintain them affordably.

Whether the database is relational or not, how it is structured, what type of search features are available, and how keys are implemented is a decision that should be made subsequent to the above discovery. That is the best practice.

Determining the structure of the data in such a way that storage, maintenance, and retrieval have the desired characteristics cannot be answered in a 500 page book adequately, thus certainly not a short answer.

Learning the pros and cons of these potential choices would be a good start. You can web search these persistence philosophies by their names and the words "database" or "persistence" to see descriptions and vendor products that correspond.

Relational table
Relational object
Tabular non-relational
Mapping (key and value)
Mapping (key and fixed record payload)
Document (free text)
Hierarchical
Graph (network of edges connecting vertices)
Multidimensional (OLAP and others)

You may find that the reason you have thousands of data types is that they correspond to document types and the only thing in common between them is the human language they are written in or possibly not even that. Perhaps they are arbitrary locale, in which case internationalized document storage systems are the options to examine first.

You may find that there is a set of semantic rules that 9,800 of your 10,000+ object types confirm to, in which case the characterization and specification of the rules may lead to a more granular storage scheme ^4,5,6. Formalization of the semantic structure in conjunction with a structural software design project (such as the composite or decorator pattern) may permit a gross reduction in the number of object types.

Such refactoring can easily be worth the time and may get your project up to speed in a fraction of the time.

Upon the discovery of additional structure, you then will need to determine what level of normalization makes sense for your store, update, retrieval, and disk footprint requirements.

Literature (all over the web) on normalization and denormalization will help you understand trade-offs between space, speed of writing, and speed of reading ^7,8.9. If a large amount of data is stored each day, the ETL characteristics will also play into the design significantly.

The selection of vendor and product is probably the last thing you will do architecturally before you start low level design and implementation and test framework construction. (That is another challenge with so many data types. How will you test 10,000+ classes adequately?)

Giving you narrower recommendations than this would be irresponsible without more characterization of the thousands of object types and why there are so many.

References

[1] https://www.tutorialspoint.com/design_pattern/design_pattern_quick_guide.htm

[2] https://sourcemaking.com/design-patterns-and-tips

[3] https://sourcemaking.com/design_patterns/strategy

[4] https://www.cs.cmu.edu/~dunja/LinkKDD2004/Jure-Leskovec-LinkKDD-2004.pdf

[5] https://archive.org/details/Learning_Structure_and_Schemas_from_Documents

[6] https://www.researchgate.net/publication/265487498_Machine_Learning_for_Document_Structure_Recognition

[7] http://databases.about.com/od/specificproducts/a/Should-I-Normalize-My-Database.htm

[8] http://www.ovaistariq.net/199/databases-normalization-or-denormalization-which-is-the-better-technique/#.WLOlG_ErLRY

[9] https://fenix.tecnico.ulisboa.pt/downloadFile/3779571831168/SchemaTuning.ppt

163

answered Oct 14 '22 22:10

Douglas Daseeco

Related questions
                            
                                Join more than two tables using Annotations in Spring Data JPA
                            
                                How could WAL （write ahead log） have better performance than write directly to disk?
                            
                                How do I test a code generation tool?
                            
                                In-memory Java DB [closed]
                            
                                Database that consumes less disk space
                            
                                How do you measure the number of open database connections
                            
                                SQLite to Oracle
                            
                                Idiomatic haskell for database abstraction
                            
                                Connecting Oracle to SQL Server via database link
                            
                                Modeling 3 entities with relationships
                            
                                Entity Framework 5.0 PostgreSQL (Npgsql) default connection factory
                            
                                What is better- Add an optional parameter to an existing SP or add a new SP?
                            
                                When using Continuous or Automated Deployment, how do you deploy databases?
                            
                                H2 DB - Column must be in Group By list
                            
                                Get last message from each conversation
                            
                                Why does an insert that groups by the primary key throw a primary key constraint violation error?
                            
                                Keeping partly-offline sqlite db in sync with postgresql
                            
                                Join elimination not working in Oracle with sub queries
                            
                                How to properly use transactions and locks to ensure database integrity?
                            
                                ERROR: syntax error at or near "SELECT"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Which is the best practice to store a huge number (10000+) of DIFFERENT object types into a database?

Tags:

object

database

Anne Droid

People also ask

1 Answers

Douglas Daseeco

Recent Activity

Donate For Us