Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using JSON instead of normalized data, is this approach correct?

There are microblogging posts, and votes/emoticons associated with them, both in MySQL innoDB tables. There is a requirement for two types of pages:

(A) Listing page containing many microblogs along with their votes counts/emoticons counts on single page ( say 25 ).

E.g.

THE GREAT FUNNY POST

Not so funny content in a meant to be funny post. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus euismod consequat pellentesque. .....READ MORE....

(3) likes, (5) bored, (7) smiled

. + 24 More posts on same page.

(B) Permalink page containing a single microblog with detailed vote+vote counts/ emoticons.

THE GREAT FUNNY POST

Not so funny content in a meant to be funny post. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus euismod consequat pellentesque. Quisque viverra adipiscing auctor. Mauris ut diam risus, in fermentum elit. Aliquam urna lectus, egestas sit amet cursus et, auctor ut elit. Nulla tempus suscipit nisi, nec condimentum dui fermentum non. In eget lacus mi, ut placerat nisi.

(You, Derp and 1 more like this), (5) bored, (7) smiled

1st approach:

Table#1:

post_id | post_content | post_title | creation_time 

Table#2 for storing votes, likes, emoticons:

action_id | post_id | action_type | action_creator | creation_time

To display a page of posts, or a single post. First table is queried to get the posts, second is queried to get all the actions related to the posts. Whenever a vote etc is done, an insert is made into the post_actions table.

2nd approach:

Table#1:

post_id | post_content | post_title | creation_time | action_data 

Where action_data can be something like { "likes" : 3,"smiles":4 ...}

Table#2:

action_id | post_id | action_type | action_creator | creation_time

To display a page of posts, only first table is queried to get the posts & action data, to display individual post with detailed actions, second table is queried to get all the actions related to the posts. Whenever a vote etc is done, an insert is made into the post_actions table, and action_data field of table#1 is updated to store updated count.

Assuming there are 100K posts, and 10x actions I.e. 1 million or more actions created. Does approach#2 provide a benefit? Any downsides of it apart from having to read, modify and update JSON information? Is there anyway in which approach#2 can be followed and further improved?

Adding more information based on feedback:

  1. Python scripts will be reading, writing data.
  2. MySQL DB servers will be different from web servers.
  3. Writes due to post creation are low I.e. 10000 per day. But those due to actions can be higher, assume maximum 50 writes per second due to actions like voting, liking, emoticon.
  4. My concern is about read/write performance comparison of both and gotchas of the second approach, and where it may fall short in future.
like image 621
DhruvPathak Avatar asked Oct 19 '12 09:10

DhruvPathak


2 Answers

I would recommend either storing all likes/votes data (aggregated and atomic) inside of the table 1 and discard table 2 completely OR to use 2 tables without aggregated data while relying on a JOIN syntax, clever queries and good indexes.

Why? Because else you will be querying and writing into both tables all the time when a comment/vote/like is made. Assuming 10 actions per post that are merely for displaying interaction, I'd really store it all into 1 table, maybe making an extra column for each kind of action. You can use JSON or simply serialize() on the arrays, which should be a bit faster.

Which solution you pick in the end will be highly dependant on how many actions you get and how you want to use them. Getting all actions for 1 post is easy with solution 1 and very fast but searching inside would be a mess. On the other hand, solution 2 takes more space, careful query-writing and indexes.

like image 171
Martin Müller Avatar answered Sep 22 '22 01:09

Martin Müller


Assuming there are much more reads from the system than writes I can think few ways to do this. You can take advantage of the fact that social networking sites really don't need to have consistent data, only eventually consistent as long as every user sees his/her actions consistently.

Option #1.

Add column for each action type in Table#1 and increment them every time new action happens. In this way the main page listing is very fast.

Table#1

post_id | post_content | post_title | creation_time | action1_count | action2_count | action3_count | ...

What is cool in this approach is that when viewing permalink you don't need to query all actions for post from table#2. Just query last 5 any actions and all actions made by the viewer. Check inspiration here: How to get the latest 2 items per category in one select (with mysql)

Option #2.

This is like your first approach, but write action counts in the Redis hashset or simply as JSON object to memcache. It's lighting fast to query those on main page load. Drawback is that if redis (and always when memcached) is restarted you need to re-initialize them, or just do it when somebody views page from permalink view.

like image 40
Teemu Ikonen Avatar answered Sep 19 '22 01:09

Teemu Ikonen