Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Popular Today, This Week, This Month - Design Pattern

I have a system that displays entries ordered by one of three fields, the most popular Today, This Week and This Month. Each time an entry is viewed the score is incremented by 1 thus changing the order.

So if entry 1 is new and viewed 10 times today its scores will be:

Today: 10
Week: 10
Month: 10

The Current Solution

At the moment I simply have 3 fields associated with each entry, one for today another for this week and another for this month. Each time an entry is viewed all three scores are incremented by 1.

At the end of the day, the day score is reset to 0. At the end of the current week the week score is set to 0 and at the end of the current calender month, the month score is set to 0.

The Problem

Although this works and uses little space it is not ideal for two reasons:

1) At the end of the current period (day,week,month) that value is reset to 0 all at once meaning that at 00:00:00 every day the ranking is all reset and all daily scores are set to 0, the same is true for end of the week and end of the month. At 00:00:00 on the 1st of each month all scores are set to 0 loosing all existing ranking data.

2) Because the end of the month usually falls inside a week (Mon-Sun), the monthly scores are reset during the week leading to weekly scores being higher than monthly scores.

Possible Solution

I could use a rolling hourly counter for every hour of the month which is used to calculate the scores for the current day,week,month based on the current hour index.

Array size = 31 * 24 = 744 int16 values

So on the 1st at 4am a view would be placed in hours[4]

hours[4]++

The stats calculator would then then use today as the sum of the last 24 values and the This Week score would be the sum of the last (24*7) values. Finally the This Month would be the sum of the last (24*31) values.

Solution problems

The major issue with solution 1 is the disk/memory requirements. I've gone from using 3 32 bit values in my current solution to using 744 32 bit values. Even if I change them to in16 I'm still going to be using a lot more memory per entry

Memory per Entry = 3 * 4 bytes = 12 bytes (Existing)
Memory per Entry = 744 * 2 = 1,488 bytes (possible solution)

With this solution my memory usage per entry has jumped 12400%!!

Can anyone suggest another solution that would meet resolve the issues in my current solution but without using 1.5k per entry?

Many thanks!

like image 530
antfx Avatar asked Jul 04 '13 09:07

antfx


3 Answers

This is actually a common problem of how to group data both effectively and keep all the necessary information.

First of all: Did you try doing it your way? Did you really lack the storage? Your solution seems reasonable.

How I would do it

I assume that you are using a database for keeping the data.

I would create two separate tables, one for hourly and one for daily statistics. Each article would have exactly 24 rows in that database, one for each hour. That would be used for hourly stats. To update a specific row you would only have to know the hour(0-23) and the entry_id. UPDATE count=count+1 WHERE hour=11 AND entry_id = 18164;

entry_id foreign key | hour integer | count integer
---------------------+--------------+--------------
1                    | 0            | 123
1                    | 2            | 1712
...

Current daily stats would be either computed around midnight (or whenever the app does the least) or summed on demand. Either way, once per day, a sum will have to be made of all hourly data and the sum will have to be inserted into the daily stats table.

entry_id foreign key | day date   | count integer
---------------------+------------+--------------
1                    | 2013-07-03 | 54197
1                    | 2013-07-04 | 66123
...

Each entry older than 31 (30/29/28) days should get deleted. Or not, if you want total or yearly statistics

Advantages

  • you keep less data than with full hourly stats: 24+31
  • sums on hourly table should be fast, if indexed on entry_id and hour
  • less memory used than in your solution

Disadvantages

  • additional scripting/triggers/jobs required to daily update the statistics
  • more work required to implement it than in your solution
like image 124
Dariusz Avatar answered Nov 06 '22 02:11

Dariusz


One simple solution would be

Use an array of 31.
Today - the last value
This Week score would be the sum of the last 7 values.
This Month would be the sum of the last 31 values.

At the end of each day, shift the whole array values by 1 to accommodate new value.

With respect to your comment,

Use another array of size 24 to store hours visit count.
Today - Sum of all elements of Array2
This Week score would be the sum of the last 7 values of Array1.
This Month would be the Sum of all elements of Array1.

At the end of each day, shift the whole array values of Array1 by 1
to accommodate new value. Last day visit count = Sum of all elements of Array2
like image 31
Ramesh Durai Avatar answered Nov 06 '22 02:11

Ramesh Durai


Maybe some kind of attenuation might help. You'd need 6 variables for Today, Yesterday, ThisWeek, LastWeek, ThisMonth, LastMonth.

Then the final rating (for instance daily) may be caltulated as: Today + Yesterday * attenuation( current_time - start_of_the_day ).

Where attenuation is something like 1 / (1 + k * time), where k is adjustible depending on how fast you want your last days rating to deflate.

UPDATE: Consider new entry was viewed 123 times during a day. And lets measure time in seconds just to get to some numbers. At 23:59 etrys' rating would be 123 + 0 * 1 / (1 + k * 86340)^2 = 100.

At midnight Today counter becomes Yesterday:

0 + 123 * 1 / ( 1 + k * 0)^2 = 123

Suppose by midday an entry gains 89 more views.

89 + 123 * 1 / ( 1 + k * 43200 )^2 = ?

Well, it's a good time to choose the k. If we want old views to fade four times in 12 hours, then k would be 1/43200. If we want in to fade hundred times - 9/43200. In this case:

89 + 123 * 1 / ( 1 + 9 )^2 = 90.23

And then forth to 23:59. Let entry gain 60 more views

149 + 123 * 1 / ( 1 + (9/43200) * 86340 )^2 ~= 149.002

So yesterday views almost completely lost their influence on a rating in 24 hours. Of course you can play with k or attenuation formula in general to match your needs best. This is just an example.

like image 38
akalenuk Avatar answered Nov 06 '22 04:11

akalenuk