What is the best approach for updating an un-indexed regular column (not a primary key related) throughout the tables containing it as a duplicate ?
i.e the user posts something and that post is duplicated in many tables for fast retrieval. But when that post changes (with an edit) it needs to be updated throughout the database, in all tables that contain that post (in tables that have different and unknown primary keys).
Solutions I'm thinking of:
Any enlightenments would be appreciated.
EDIT (fictional schema).
What if the post changes? or even the user's display_name?
CREATE TABLE users (
id uuid,
display_name text,
PRIMARY KEY ((id))
);
CREATE TABLE posts (
id uuid,
post text,
poster_id uuid,
poster_display_name text
tags set<text>,
statistics map<int, bigint>,
PRIMARY KEY ((id))
);
CREATE TABLE posts_by_user (
user_id uuid,
created timeuuid,
post text,
post_id uuid,
tags set<text>,
statistics map<int, bigint>,
PRIMARY KEY ((id), created)
);
Denormalizing a database requires data has first been normalized. With denormalization, the database administrator selectively adds back specific instances of redundant data after the data structure has been normalized. A denormalized database should not be confused with a database that has never been normalized.
Denormalization is a strategy used on a previously-normalized database to increase performance. The idea behind it is to add redundant data where we think it will help us the most. We can use extra attributes in an existing table, add new tables, or even create instances of existing tables.
Advantages of Database denormalization: As there is no need to use joins between tables, it is possible to extract the necessary information from one table, which automatically increases the speed of query execution. Additionally, this solution saves memory. Writing queries is much easier.
Normalization Technique Normalization is a technique that helps in organizing the data in the database. The main purpose of normalizing the data is to remove redundancy from the relation.
It depends on the frequency of the updates. For instance, if users only update their names infrequently (a handful of time per user account), then it may be ok to use a secondary index. Just know that using a 2i is a scatter gather, so you'll see performance issues if it's a common operation. In those cases, you'll want to use a materialized view (either the ones in 3.0 or manage it yourself) to be able to get the list of all the posts for a given user, then update the user's display name.
I recommend doing this in a background job, and giving the user a message like "it may take [some unit of time] for the change in your name to be reflected everywhere".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With