Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Better way to get lists of items : cached serialized data vs database queries or other?

Tags:

php

mysql

caching

I have made a lot of searches about caching data in files (serialize/unserialise vs json_encode/decode, var_export, igbonary) and mysql queries (optimizations, stored procedures, query cache), but at this moment, I wonder what is the better way to optimize a concrete case like the following.

Sorry in advance : this is a long topic for a small answer I guess, but it is necessary to understand the project. And excuse my poor english, which is not my first language.

Let's imagine that we have this database relationships. enter image description here

Description of the database (estimated number of records in parentheses) :

  • MODULE (10) : is the type of Item, could be article, forum topic, ad, news...
  • ITEM (millions) : any type with a title and some text
  • CATEGORY (50) : items categories (animals, politic, cars, computers...)
  • TAG (hundreds of thousands): category's tags (ex. for politic : Internationnal, France, Barack Obama...)
  • ITEM_TAG (outch) : items and tags associations

So we have several relationships, and each is recorder at the ITEM creation/update.

I have already cached ITEMs data in folders and files with the following example :

public function cacheItem()
{
    $req=mysql_query("SELECT id, title, content, id_mod, id_cat
            FROM ITEM
            WHERE ITEM.id='".$this->id."'")or die(mysql_error());
    if(mysql_num_rows($req)==1)
    {
        $this->itemData=mysql_fetch_array($req);
        $this->folder=floor($this->id/1000);//1000 items max per folder
        $this->itemUrl=$this->folder."/".$this->id.".txt";                      
        if(!file_exists($this->itemUrl))
        {
            touch($this->itemUrl);
        }
        file_put_contents($this->itemUrl,serialize($this->itemData),LOCK_EX);
    }
}

And I get them by an unserialize(file_get_contents($url)), this part works like a charm !

Now I wish to optimize the lists of ITEMs to display them by several options (for example), foreach display with a limit of 100 per pagination :

  • ALL ITEMs
  • ITEMs of a MODULE
  • ITEMs of a CATEGORY
  • ITEMs of a CATEGORY and a MODULE
  • ITEMs of a TAG
  • ITEMs of a TAG and a CATEGORY
  • ITEMs of a TAG and a CATEGORY and a MODULE

I already know how to do this in SQL and to put the results in a cache tree.

My problem, with those cache files, is that when a new ITEM is created/updated, the list may have to be refreshed with a lot of strictness.

First question :

So what will happen if ITEMs are created/updated (so those lists too) at the same time ?

Does the LOCK_EX of the file_put_contents(); will do his job while getting files from file_get_contents(); ?

Second question

I understand that more PHP will work, less mySQL will (and the otherwise), but what is the better (faster to display) way to do those lists with pagination, which will be displayed each second or more, and only modified by adding/updating a new ITEM ?

  • My cache system (I don't think so...)
  • Stored procedures in mySQL
  • Several database servers and/or several files servers
  • Other

Any ideas, examples, links greatly appreciated.

P.S. : just for fun I may ask "how does Facebook" and "how does stackoverflow" ?

like image 368
Valky Avatar asked Feb 14 '13 04:02

Valky


People also ask

What types of data do you think are most important to have cached for quick distribution?

In-memory data lookup: If you have a mobile / web app front end you might want to cache some information like user profile, some historical / static data, or some api response according to your use cases. Caching will help in storing such data.

Should you cache database queries?

In-memory data caching can be one of the most effective strategies to improve your overall application performance and to reduce your database costs. Caching can be applied to any type of database including relational databases such as Amazon RDS or NoSQL databases such as Amazon DynamoDB, MongoDB and Apache Cassandra.

How can you speed up the database performance through caching?

You can optimize your database caching by tweaking the schema. Indexing in a database is one way of optimizing database reads. The database also has a working set of data in-memory to handle frequent requests to the same data. Server caching is the custom caching of data in a server application.

What are caching strategies?

A proper caching strategy includes effective use of both write-through and lazy loading of your data and setting an appropriate expiration for the data to keep it relevant and lean.


1 Answers

First question:

Your operations should be fine with LOCK_EX. The files may get locked if accessed simultaneously which will definitely slow things down, but all operations should complete correctly. However, this is a good example why you should not implement your own cache system.

Second question:

MySQL will definitely be faster than your cache system (Unless you do some seriously wicket coding and not in PHP). Databases like MySQL have done a lot of work in optimizing their performance.

I don't believe that stored procedures in MySQL will offer you any real benefit in the examples provided above over plain old SELECT queries.

Using a NoSQL approach like MongoDB can help you if you use sharding on a server cluster. This is more difficult to write and more servers cost more money. Also, it is not clear from your question if moving to a different database system is an option.

If you stick with MySQL, it is probably easier to implement load balancing application servers than a database server cluster. With this in mind, more work done by PHP is preferred to more work in MySQL. I would not follow this approach though, because you are giving up much for only a small benefit.

In short, I recommend that you stick to plain SELECT queries to get what you need. Run your application and database on separate servers, and use the more powerful server for your DB server.

PS. Facebook write a pre-compiler for PHP to make their code run faster. In my opinion, PHP is not a very fast language and you can get better results from Python or Node.js.

Stackoverflow use ASP.NET MVC with MS SQL Server. They have a single big powerful server for the database and apparently rather use DB queries where they can. They also use load balanced application servers that are separate from their DB server.

like image 135
neelsg Avatar answered Nov 15 '22 00:11

neelsg