Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it a good idea to create tables dynamically to store user-content?

I'm currently designing an application where users can create/join groups, and then post content within a group. I'm trying to figure out how best to store this content in a RDBMS.

Option 1: Create a single table for all user content. One of the columns in this table will be the groupID, designating which group the content was posted in. Create an index using the groupID, to enable fast searching of content within a specific group. All content reads/writes will hit this single table.

Option 2: Whenever a user creates a new group, we dynamically create a new table. Something like group_content_{groupName}. All content reads/writes will be routed to the group-specific dynamically created table.

Pros for Option 1:

  1. It's easier to search for content across multiple groups, using a single simple query, operating on a single table.
  2. It's easier to construct simple cross-table queries, since the content table is static and well-defined.
  3. It's easier to implement schema changes and changes to indexing/triggers etc, since there's only one table to maintain.

Pros for Option 2:

  1. All reads and writes will be distributed across numerous tables, thus avoiding any bottlenecks that can result from a lot of traffic hitting a single table (though admittedly, all these tables are still in a single DB)
  2. Each table will be much smaller in size, allowing for faster lookups, faster schema-changes, faster indexing, etc
  3. If we want to shard the DB in future, the transition would be easier if all the data is already "sharded" across different tables.

What are the general recommendations between the above 2 options, from performance/development/maintenance perspectives?

like image 529
RvPr Avatar asked Dec 23 '22 13:12

RvPr


2 Answers

One of the cardinal sins in computing is optimizing too early. It is the opinion of this DBA of 20+ years that you're overestimating the IO that's going to happen to these groups.. RDBMS's are very good at querying and writing this type of info within a standard set of tables. Worst case, you can partition them later. You'll have a lot more search capability and management ease with 1 set of tables instead of a set per user.

Imagine if the schema needs to change? do you really want to update hundreds or thousands of tables or write some long script to fix a mundane issue? Stick with a single set of tables and ignore sharding. Instead, think "maybe we'll partition the tables someday, if necessary"

like image 104
Joe Love Avatar answered Dec 26 '22 03:12

Joe Love


It is a no-brainer. (1) is the way to go.

You list these as optimizations for the second method. All of these are misconceptions. See comments below:

All reads and writes will be distributed across numerous tables, thus avoiding any bottlenecks that can result from a lot of traffic hitting a single table (though admittedly, all these tables are still in a single DB)

Reads and writes can just as easily be distributed within a table. The only issue would be write conflicts within a page. That is probably a pretty minor consideration, unless you are dealing with more than dozens of transactions per second.

Because of the next item (partially filled pages), you are actually much better off with a single table and pages that are mostly filled.

Each table will be much smaller in size, allowing for faster lookups, faster schema-changes, faster indexing, etc

Smaller tables can be a performance disaster. Tables are stored on data pages. Each table is then a partially filled page. What you end up with is:

  • A lot of wasted space on disk.
  • A lot of wasted space in your page cache -- space that could be used to store records.
  • A lot of wasted I/O reading in partially filled pages.

If we want to shard the DB in future, the transition would be easier if all the data is already "sharded" across different tables.

Postgres supports table partitioning, so you can store different parts of a table in different places. That should be sufficient for your purpose of spreading the I/O load.

like image 20
Gordon Linoff Avatar answered Dec 26 '22 01:12

Gordon Linoff