redis as write-back view count cache for mysql

Tags:

I have a very high throughput site for which I'm trying to store "view counts" for each page in a mySQL database (for legacy reasons they must ultimately end up in mySQL).

The sheer number of views is making it impractical to do SQL "UPDATE ITEM SET VIEW_COUNT=VIEW_COUNT+1" type of statements. There are millions of items but most are only viewed a small number of times, others are viewed many times.

So I'm considering using Redis to gather the view counts, with a background thread that writes the counts to mySQL. What is the recommended method for doing this? There are some issues with the approach:

how often does the background thread run?
how does it determine what to write back to mySQL?
should I store a Redis KEY for every ITEM that gets hit?
what TTL should I use?
is there already some pre-built solution or powerpoint presentation that gets me halfway there, etc.

I have seen very similar questions on StackOverflow but none with a great answer...yet! Hoping there's more Redis knowledge out there at this point.

974

asked May 26 '13 17:05

OneSolitaryNoob

1 Answers

I think you need to step back and look at some of your questions from a different angle to get to your answers.

"how often does the background thread run?" To answer this you need to answer these questions: How much data can you lose? What is the reason for the data being in MySQL, and how often is that data accessed? For example, if the DB is only needed to be consulted once per day for a report, you might only need it to be updated once per day. On the other hand, what if the Redis instance dies? How many increments can you lose and still be "ok"? These will provide the answers to the question of how often to update your MySQL instance and aren't something we can answer for you.

I would use a very different strategy for storing this in redis. For the sake of the discussion let us assume you decide you need to "flush to db" every hour.

Store each hit in hashes with a key name structure along these lines:

interval_counter:DD:HH
interval_counter:total

Use the page id (such as MD5 sum of the URI, the URI itself, or whatever ID you currently use) as the hash key and do two increments on a page view; one for each hash. This provides you with a current total for each page and a subset of pages to be updated.

You would then have your cron job run a minute or so after the start of the hour to pull down all pages with updated view counts by grabbing the previous hour's hash. This provides you with a very fast means of getting the data to update the MySQL DB with while avoiding any need to do math or play tricks with timestamps etc.. By pulling data from a key which is no longer bing incremented you avoid race conditions due to clock skew.

You could set an expiration on the daily key, but I'd rather use the cron job to delete it when it has successfully updated the DB. This means your data is still there if the cron job fails or fails to be executed. It also provides the front-end with a full set of known hit counter data via keys that do not change. If you wanted, you could even keep the daily data around to be able to do window views of how popular a page is. For example if you kept the daily hash around for 7 days by setting an expire via the cron job instead of a delete, you could display how much traffic each page has had per day for the last week.

Executing two hincr operations can be done either solo or pipelined still performs quite well and is more efficient than doing calculations and munging data in code.

Now for the question of expiring the low traffic pages vs memory use. First, your data set doesn't sound like one which will require huge amounts of memory. Of course, much of that depends on how you identify each page. If you have a numerical ID the memory requirements will be rather small. If you still wind up with too much memory, you can tune it via the config, and if needs be could even use a 32 bit compile of redis for a significant memory use reduction. For example, the data I describe in this answer I used to manage for one of the ten busiest forums on the Internet and it consumed less than 3GB of data. I also stored the counters in far more "temporal window" keys than I am describing here.

That said, in this use case Redis is the cache. If you are still using too much memory after the above options you could set an expiration on keys and add an expire command to each ht. More specifically, if you follow the above pattern you will be doing the following per hit:

hincr -> total
hincr -> daily
expire -> total

This lets you keep anything that is actively used fresh by extending it's expiration every time it is accessed. Of course, to do this you'd need to wrap your display call to catch the null answer for hget on the totals hash and populate it from the MySQL DB, then increment. You could even do both as an increment. This would preserve the above structure and would likely be the same codebase needed to update the Redis server from the MySQL Db if you the Redis node needed repopulation. For that you'll need to consider and decide which data source will be considered authoritative.

You can tune the cron job's performance by modifying your interval in accordance with the parameters of data integrity you determine from the earlier questions. To get a faster running cron nob you decrease the window. With this method decreasing the window means you should have a smaller collection of pages to update. A big advantage here is you don't need to figure out what keys you need to update and then go fetch them. you can do an hgetall and iterate over the hash's keys to do updates. This also saves many round trips by retrieving all the data at once. In either case if you will likely want to consider a second Redis instance slaved to the first to do your reads from. You would still do deletes against the master but those operations are much quicker and less likely to introduce delays in your write-heavy instance.

If you need disk persistence of the Redis DB, then certainly put that on a slave instance. Otherwise if you do have a lot of data being changed often your RDB dumps will be constantly running.

I hope that helps. There are no "canned" answers because to use Redis properly you need to think first about how you will access the data, and that differs greatly from user to user and project to project. Here I based the route taken on this description: two consumers accessing the data, one to display only and the other to determine updating another datasource.

118

answered Sep 21 '22 20:09

The Real Bill

Related questions
                            
                                Zend Framework MySQL Update Column
                            
                                Ubuntu 11.04 - Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (13)
                            
                                How to insert multiple rows in a mysql database at once with prepared statements?
                            
                                JDBC - Statement, PreparedStatement, CallableStatement and caching
                            
                                MySQL join tables where table name is a field of another table
                            
                                Ignoring the year in SQL Query with date range
                            
                                Emoji on MySQL and PHP: why some symbol yes other not?
                            
                                String compare exact in query MySQL
                            
                                How can I make sure that PDO's lastInsertId() is not that of another simultaneous insert?
                            
                                How to use goto label in MySQL stored function
                            
                                How can I recreate a Django project's database?
                            
                                Aggregate data tables
                            
                                Private messaging system. Listing last message of each conversation
                            
                                Group_Concat in Concat not working with NULL values
                            
                                Average Age from DOB Field - MySQL / PHP
                            
                                How to convert data stored in XML files into a relational database (MySQL)?
                            
                                Why isn't this SQL code working?
                            
                                SQL getting SUM of row for each PlayerID
                            
                                OpenCart: How to accurately populate oc_category_path
                            
                                SQLAlchemy necessary table properties

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

redis as write-back view count cache for mysql

Tags:

mysql

caching

redis

OneSolitaryNoob

People also ask

1 Answers

The Real Bill

Recent Activity

Donate For Us