Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which is the best practice to store a huge number (10000+) of DIFFERENT object types into a database?

When designing a new relational database, normally each object type is represented by a corresponding table. Which is the best practice to design a database, which stores a huge number of DIFFERENT object types in order to avoid to create and maintain thousands of database tables? Which better alternatives to a relational database exist for this case?

like image 687
Anne Droid Avatar asked Feb 18 '17 08:02

Anne Droid


People also ask

How does mysql store large data?

Use a column of datatype 'text', 'mediumtext', or 'largetext' according to your needs. Alternatively, you could just output the data to a file. They are more appropriate for logging large amounts data that may not need to be accessed often - which it seems like this might be.

What type of data can be stored in a database?

Which type of data can be stored in the database? Explanation: The reason for creating the database management system was to store large data and these data can be of any form image, text, audio, or video files, etc. DBMS allows the users to store and access the data of any format. 6.

What type of data you should not store in database?

Finally, you shouldn't store credit card information in your database unless you absolutely need to. This includes credit card owner names, numbers, CVV numbers, and expiration dates.

What is considered a large SQL database?

The most common definition of VLDB is a database that occupies more than 1 terabyte or contains several billion rows, although naturally this definition changes over time.


1 Answers

The answer depends much on the nature of the distinctions between the thousands of object types and to what degree and in what ways they can be classified and possibly generalized further. Discovery is the key to a maintainable design in scenarios such as this.

Here are some potential persistence options that may work for your set of object types. It will take some thought to consider the pros and cons of each.

  1. Discover a hidden structure or pattern in the object types allowing them to be decomposed 1,2,3.
  2. Discover categories of object types to which (1) can then be applied.
  3. Map multiple objects to a single or smaller set of tables or document types.
  4. Map the objects one to one and determine a meta scheme to maintain them affordably.

Whether the database is relational or not, how it is structured, what type of search features are available, and how keys are implemented is a decision that should be made subsequent to the above discovery. That is the best practice.

Determining the structure of the data in such a way that storage, maintenance, and retrieval have the desired characteristics cannot be answered in a 500 page book adequately, thus certainly not a short answer.

Learning the pros and cons of these potential choices would be a good start. You can web search these persistence philosophies by their names and the words "database" or "persistence" to see descriptions and vendor products that correspond.

  • Relational table
  • Relational object
  • Tabular non-relational
  • Mapping (key and value)
  • Mapping (key and fixed record payload)
  • Document (free text)
  • Hierarchical
  • Graph (network of edges connecting vertices)
  • Multidimensional (OLAP and others)

You may find that the reason you have thousands of data types is that they correspond to document types and the only thing in common between them is the human language they are written in or possibly not even that. Perhaps they are arbitrary locale, in which case internationalized document storage systems are the options to examine first.

You may find that there is a set of semantic rules that 9,800 of your 10,000+ object types confirm to, in which case the characterization and specification of the rules may lead to a more granular storage scheme 4,5,6. Formalization of the semantic structure in conjunction with a structural software design project (such as the composite or decorator pattern) may permit a gross reduction in the number of object types.

Such refactoring can easily be worth the time and may get your project up to speed in a fraction of the time.

Upon the discovery of additional structure, you then will need to determine what level of normalization makes sense for your store, update, retrieval, and disk footprint requirements.

Literature (all over the web) on normalization and denormalization will help you understand trade-offs between space, speed of writing, and speed of reading 7,8.9. If a large amount of data is stored each day, the ETL characteristics will also play into the design significantly.

The selection of vendor and product is probably the last thing you will do architecturally before you start low level design and implementation and test framework construction. (That is another challenge with so many data types. How will you test 10,000+ classes adequately?)

Giving you narrower recommendations than this would be irresponsible without more characterization of the thousands of object types and why there are so many.


References

[1] https://www.tutorialspoint.com/design_pattern/design_pattern_quick_guide.htm

[2] https://sourcemaking.com/design-patterns-and-tips

[3] https://sourcemaking.com/design_patterns/strategy

[4] https://www.cs.cmu.edu/~dunja/LinkKDD2004/Jure-Leskovec-LinkKDD-2004.pdf

[5] https://archive.org/details/Learning_Structure_and_Schemas_from_Documents

[6] https://www.researchgate.net/publication/265487498_Machine_Learning_for_Document_Structure_Recognition

[7] http://databases.about.com/od/specificproducts/a/Should-I-Normalize-My-Database.htm

[8] http://www.ovaistariq.net/199/databases-normalization-or-denormalization-which-is-the-better-technique/#.WLOlG_ErLRY

[9] https://fenix.tecnico.ulisboa.pt/downloadFile/3779571831168/SchemaTuning.ppt

like image 163
Douglas Daseeco Avatar answered Oct 14 '22 22:10

Douglas Daseeco