Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to handle a change in denormalized data

What is the best approach for updating an un-indexed regular column (not a primary key related) throughout the tables containing it as a duplicate ?

i.e the user posts something and that post is duplicated in many tables for fast retrieval. But when that post changes (with an edit) it needs to be updated throughout the database, in all tables that contain that post (in tables that have different and unknown primary keys).

Solutions I'm thinking of:

  1. Have a mapper table to track down the primary keys in all those tables, but it seems to lead to tables explosion (post is not the only property that might change).
  2. Use Solr to do the mapping, but I fear I would be using it for the wrong purpose.

Any enlightenments would be appreciated.

EDIT (fictional schema).

What if the post changes? or even the user's display_name?

CREATE TABLE users (
    id uuid,

    display_name text,

    PRIMARY KEY ((id))
);

CREATE TABLE posts (
    id uuid,

    post text,
    poster_id uuid,
    poster_display_name text
    tags set<text>,
    statistics map<int, bigint>,

    PRIMARY KEY ((id))
);

CREATE TABLE posts_by_user (
    user_id uuid,
    created timeuuid,

    post text,
    post_id uuid,
    tags set<text>,
    statistics map<int, bigint>,

    PRIMARY KEY ((id), created)
);
like image 362
Dror Weiss Avatar asked Jan 01 '16 08:01

Dror Weiss


People also ask

What happens when you store the data in a denormalized way in Rdbms?

Denormalizing a database requires data has first been normalized. With denormalization, the database administrator selectively adds back specific instances of redundant data after the data structure has been normalized. A denormalized database should not be confused with a database that has never been normalized.

What is denormalization when we should do explain with suitable example?

Denormalization is a strategy used on a previously-normalized database to increase performance. The idea behind it is to add redundant data where we think it will help us the most. We can use extra attributes in an existing table, add new tables, or even create instances of existing tables.

What is an advantage of storing data in a denormalized table rather than in multiple normalized tables?

Advantages of Database denormalization: As there is no need to use joins between tables, it is possible to extract the necessary information from one table, which automatically increases the speed of query execution. Additionally, this solution saves memory. Writing queries is much easier.

What is the main purpose of data de normalization?

Normalization Technique Normalization is a technique that helps in organizing the data in the database. The main purpose of normalizing the data is to remove redundancy from the relation.


Video Answer


1 Answers

It depends on the frequency of the updates. For instance, if users only update their names infrequently (a handful of time per user account), then it may be ok to use a secondary index. Just know that using a 2i is a scatter gather, so you'll see performance issues if it's a common operation. In those cases, you'll want to use a materialized view (either the ones in 3.0 or manage it yourself) to be able to get the list of all the posts for a given user, then update the user's display name.

I recommend doing this in a background job, and giving the user a message like "it may take [some unit of time] for the change in your name to be reflected everywhere".

like image 106
Jon Haddad Avatar answered Oct 07 '22 05:10

Jon Haddad