I'm designing the schema for a set of document editors (spreadsheet editor, text document editor, powerpoint editor, etc.). The editors will share a database, although they may use separate databases someday. Each editor shares a lot of common information for each doc, but then -- depending on the kind of document -- there is editor-specific information also.
My question comes from trying to design the parts of the schema that will be different for each editor. Assume that there will be a Docs table, that holds common information about the documents in general (say, ID). On top of this, I want to associate information specific to a particular editor that has a 1:1 relationship with the Doc record. My proposed schema is:
CREATE TABLE Docs (
DocId STRING(MAX) NOT NULL,
CreationTime TIMESTAMP NOT NULL,
....
) PRIMARY KEY (DocId);
CREATE TABLE SpreadsheetStuff (
DocId STRING(MAX) NOT NULL,
... spreadsheet-specific information here ...
) PRIMARY KEY (DocId),
INTERLEAVE IN PARENT Docs
ON DELETE CASCADE;
CREATE TABLE TextDocumentStuff (
DocId STRING(MAX) NOT NULL,
... text-document-specific information here ...
) PRIMARY KEY (DocId),
INTERLEAVE IN PARENT Docs
ON DELETE CASCADE;
My reasoning for having the separate table is to isolate the common parts from any editor-specific stuff.
I wonder if this this is unnecessary, as editors can alter the Docs table as necessary for their own needs, even though this structure works technically. In other words, I could just have a ton of extra columns in the Docs table with editor-specific information. One concern is that my proposed structure may have performance or other implications that aren't obvious.
Is this a reasonable structure for a 1:1 relationship? Is there definitive guidance one way or the other about best practices?
Cloud Spanner can handle either option efficiently, assuming you don't risk getting close to the column limit. If you plan to do a lot of SQL queries, they may be more complicated with the two-table approach, since you'll officially need to join them (although the joins should generally be efficient since the data is interleaved). Despite the extra SQL complication of the JOIN, this is probably the cleaner approach. YMMV.
Answering here for CockroachDB, which also supports interleaving tables.
The idea behind interleaving tables is simply so the data is laid out in such a way that data that is often read together is on the same server, which requires fewer trips. It’s a tool to enhance performance and not something that’s explicitly useful for data modeling.
To meaningfully interleave tables like you’re proposing in your example, you’d probably want to include a User table, and then interleave the Docs in there. This would increase the likelihood of all of a user’s docs being located on the same server, making it faster to return all of a user’s available docs as soon as they log in. That's going to be where you get the biggest benefit out of interleave tables (in terms of performance).
However, your question is really more about data modeling, which is something that you could potentially enforce with Foreign Key relationships. The nice thing about this is that it's also an explicit requirement for interleaved tables in CockroachDB to have these relationships explicitly stated (which it looks like isn't required in Cloud Spanner).
Using your example schema, here’s what this would look like in CockroachDB:
CREATE TABLE Users (
UserId INT PRIMARY KEY,
...,
);
CREATE TABLE Docs (
UserId INT,
DocId INT,
CreationTime TIMESTAMP NOT NULL,
...,
PRIMARY KEY (UserId, DocId),
CONSTRAINT fk_Users FOREIGN KEY (UserId) REFERENCES Users
) INTERLEAVE IN PARENT Users (UserId);
CREATE TABLE SpreadsheetStuff (
UserId INT,
DocId INT,
PRIMARY KEY (UserId, DocId),
... spreadsheet-specific information here ...
CONSTRAINT fk_Docs FOREIGN KEY (UserId, DocId) REFERENCES Docs
) INTERLEAVE IN PARENT Docs (UserId, DocId);
CREATE TABLE TextDocumentStuff (
UserId INT,
DocId INT,
PRIMARY KEY (UserId, DocId),
...text-document-specific information here ...
CONSTRAINT fk_Docs FOREIGN KEY (UserId, DocId) REFERENCES Docs
) INTERLEAVE IN PARENT Docs (UserId, DocId);
When a user logs in, the query you'd want to write then would probably be something like:
SELECT * FROM Docs WHERE UserId = [this User's ID];
That would give you everything you needed in one place, and once the user clicked on one of the documents, you could then query the specific table for the type of document they're editing, which is sensible because you're actually opening a different program depending on their action.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With