I have a system that displays entries ordered by one of three fields, the most popular Today, This Week and This Month. Each time an entry is viewed the score is incremented by 1 thus changing the order.
So if entry 1 is new and viewed 10 times today its scores will be:
Today: 10
Week: 10
Month: 10
The Current Solution
At the moment I simply have 3 fields associated with each entry, one for today another for this week and another for this month. Each time an entry is viewed all three scores are incremented by 1.
At the end of the day, the day score is reset to 0. At the end of the current week the week score is set to 0 and at the end of the current calender month, the month score is set to 0.
The Problem
Although this works and uses little space it is not ideal for two reasons:
1) At the end of the current period (day,week,month) that value is reset to 0 all at once meaning that at 00:00:00 every day the ranking is all reset and all daily scores are set to 0, the same is true for end of the week and end of the month. At 00:00:00 on the 1st of each month all scores are set to 0 loosing all existing ranking data.
2) Because the end of the month usually falls inside a week (Mon-Sun), the monthly scores are reset during the week leading to weekly scores being higher than monthly scores.
Possible Solution
I could use a rolling hourly counter for every hour of the month which is used to calculate the scores for the current day,week,month based on the current hour index.
Array size = 31 * 24 = 744 int16 values
So on the 1st at 4am a view would be placed in hours[4]
hours[4]++
The stats calculator would then then use today as the sum of the last 24 values and the This Week score would be the sum of the last (24*7) values. Finally the This Month would be the sum of the last (24*31) values.
Solution problems
The major issue with solution 1 is the disk/memory requirements. I've gone from using 3 32 bit values in my current solution to using 744 32 bit values. Even if I change them to in16 I'm still going to be using a lot more memory per entry
Memory per Entry = 3 * 4 bytes = 12 bytes (Existing)
Memory per Entry = 744 * 2 = 1,488 bytes (possible solution)
With this solution my memory usage per entry has jumped 12400%!!
Can anyone suggest another solution that would meet resolve the issues in my current solution but without using 1.5k per entry?
Many thanks!
This is actually a common problem of how to group data both effectively and keep all the necessary information.
First of all: Did you try doing it your way? Did you really lack the storage? Your solution seems reasonable.
I assume that you are using a database for keeping the data.
I would create two separate tables, one for hourly
and one for daily
statistics. Each article would have exactly 24 rows in that database, one for each hour. That would be used for hourly
stats. To update a specific row you would only have to know the hour(0-23) and the entry_id. UPDATE count=count+1 WHERE hour=11 AND entry_id = 18164;
entry_id foreign key | hour integer | count integer
---------------------+--------------+--------------
1 | 0 | 123
1 | 2 | 1712
...
Current daily stats would be either computed around midnight (or whenever the app does the least) or summed on demand. Either way, once per day, a sum will have to be made of all hourly data and the sum will have to be inserted into the daily
stats table.
entry_id foreign key | day date | count integer
---------------------+------------+--------------
1 | 2013-07-03 | 54197
1 | 2013-07-04 | 66123
...
Each entry older than 31 (30/29/28) days should get deleted. Or not, if you want total or yearly statistics
Advantages
Disadvantages
One simple solution would be
Use an array of 31.
Today - the last value
This Week score would be the sum of the last 7 values.
This Month would be the sum of the last 31 values.
At the end of each day, shift the whole array values by 1 to accommodate new value.
With respect to your comment,
Use another array of size 24 to store hours visit count.
Today - Sum of all elements of Array2
This Week score would be the sum of the last 7 values of Array1.
This Month would be the Sum of all elements of Array1.
At the end of each day, shift the whole array values of Array1 by 1
to accommodate new value. Last day visit count = Sum of all elements of Array2
Maybe some kind of attenuation might help. You'd need 6 variables for Today
, Yesterday
, ThisWeek
, LastWeek
, ThisMonth
, LastMonth
.
Then the final rating (for instance daily) may be caltulated as: Today + Yesterday * attenuation( current_time - start_of_the_day )
.
Where attenuation is something like 1 / (1 + k * time)
, where k
is adjustible depending on how fast you want your last days rating to deflate.
UPDATE: Consider new entry was viewed 123 times during a day. And lets measure time in seconds just to get to some numbers. At 23:59 etrys' rating would be 123 + 0 * 1 / (1 + k * 86340)^2 = 100
.
At midnight Today
counter becomes Yesterday
:
0 + 123 * 1 / ( 1 + k * 0)^2 = 123
Suppose by midday an entry gains 89 more views.
89 + 123 * 1 / ( 1 + k * 43200 )^2 = ?
Well, it's a good time to choose the k
. If we want old views to fade four times in 12 hours, then k
would be 1/43200
. If we want in to fade hundred times - 9/43200
. In this case:
89 + 123 * 1 / ( 1 + 9 )^2 = 90.23
And then forth to 23:59. Let entry gain 60 more views
149 + 123 * 1 / ( 1 + (9/43200) * 86340 )^2 ~= 149.002
So yesterday views almost completely lost their influence on a rating in 24 hours. Of course you can play with k
or attenuation formula in general to match your needs best. This is just an example.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With