Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I improve the performance of my social networking site using memcached?

Tags:

php

memcached

I would like to implement memcached on my social network site. Being a social network, most data changes very frequently.

For example if I were to store a user's 10,000 friends in the cache, any time he adds a friend, the cache would need to be updated. It's easy enough, but it would also need to update any time someone else added them as a friend. That's a lot of updating just on the friend list alone.

There are also user blogs and bulletins which are posted non-stop with new ones and you can only see the ones that are created by a user in your friend list, so I think this would be very hard to cache.

I could see possibly caching some profile info that only changes when a user updates their profile, but this would create a cache record for every user, if there are 100,000+ users that's a lot of caching. Is this a good idea?

like image 653
JasonDavis Avatar asked Nov 05 '22 20:11

JasonDavis


1 Answers

I would say that it is a good idea to cache where possible.... most of the time you will be able to pull items from memcached (especially if you have complex joins and such) faster than a traditional RDBMS. I currently employ such a strategy with great success, and here is what i have learned from the experience:

  1. if possible, cache indefinitely, and write a new value when a change is made. try not to do an explicit delete, as you could cause a race condition with multiple concurrent accesses to the data trying to update the cache. also implement locking if an item does not exist in the cache to prevent the above issue (using memcached "add" + short sleep time in a loop)

  2. refresh cache in the background if possible, using a queue. My implementation currently uses a multi-threaded perl processes running in the background + beanstalkd, thus preventing lag time on the frontend. most of the time changes can incur a short lag.

  3. use memcached getmulti if possible, many separate memcached calls really add up.

  4. tier your cache, when checking for an item, check a local array first, then memcached, then db. cache result in the local array after first access to prevent hitting memcached multiple times in a script execution for the same item. EDIT: to clarify, if using a scripted language such as PHP, the local array would live only as long as the current script execution :) an example:

    class Itemcache {
        private $cached_items = array();
        private $memcachedobj;
    
        public function getitem($memcache_key){
            if(isset($this->cached_items[$memcache_key])){
                return $this->cached_items[$memcache_key];
            }elseif($result = $this->memcachedobj->get($memcache_key)){
                $this->cached_items[$memcache_key] = $result;
                return $result;
            }else{
                // db query here as $dbresult
                $this->memcachedobj->set($memcache_key,$dbresult,0);
                $this->cached_items[$memcache_key] = $dbresult;
                return $dbresult;
        }
    }
    
  5. write a wrapper function that implements the above caching strategy #4.

  6. use a consistent key structure in memcached, eg. 'userinfo_{user.pk}' where user.pk is the primary key of the user in the rdbms.

  7. if your data requires post processing, do this processing where possible BEFORE placing in the cache, will save a few cycles on every hit of that data.

like image 137
Jason Avatar answered Nov 12 '22 14:11

Jason