Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are some of the best tools/strategies to cache medium/large data sets in PHP?

I have your average PHP app (running on Windows server) with forms and data grids/lists and some of them require running pretty complex queries that I've optimized to the max, and I doubt there's a ton that can be done to make them run much faster. I also don't have the option of changing the database structure, given other processes that depend on the structure. So since caching hasn't really been used much in the app, that seems to be the next logical step.

I recently read up on generational caching and came up with a decent mechanism to automate caching of queries in my apps. My issue now is that I'm running into size limitations for both options that appeared to be logical choices. WinCache limits you to a total of 85MB, which isn't going to cut it, and memcached limits an item to 1MB, which doesn't seem like much if you have a query that returns a fairly large number of records and has a lot of fields. OK, to be exact, seems like memcached now allows you to set a larger size, but the mere fact that it's 1MB by default and used to only allow that makes me question what I'm trying to do.

The maximum number of records my grid allow to return at once is 1000 records, so that's the maximum number of records that could get stored in the cache (fields per record vary, of course). I know that a huge number of users would probably mean the cache would fill up very quickly, but the number of concurrent users is usually not huge and from what I read, if memcached runs out of memory it'll just get rid of the oldest cached items, so I don't see a big downside with storing larger data sets, unless the operation is very expensive, but from what I've read it doesn't seem to be the case.

So in summary, what I'm wondering is if it's a bad idea to store larger data sets in memcached (and granted, I know I don't want to store a query with a million records in there). And if it's a bad idea, what would be a good alternative for caching/improving performance when retrieving those data sets?

like image 805
Rocket04 Avatar asked Feb 18 '14 22:02

Rocket04


Video Answer


2 Answers

There are a lot of Options you can go with:

1) If you are dealing with a lot of queries, then the use of MASTER/SLAVE DB ARCHITECTURE will be of high help.Select queries can be performed on the SLAVE DB which will reduce immense overload on MASTER DB.

2)Use of SPHINX will definitely help you in boosting your speed of data retreival. You can read about it in wikipedia article WIKI-Sphinx.

3) You can also use REDIS server which also supports Master/Slave replication. REDIS ARTICLE

It also depends on other factors, the way you normalize the table structures,indexing, performing joins.

NOTE:: Unnecessary use of JOINS are generally avoided. You can read about it in here IBM-Avoiding unnecessary outer joins

Hope this helps

like image 29
Abhinav Avatar answered Oct 05 '22 22:10

Abhinav


Unless you have a good reason to send that data over the wire to a cache, don't.

If at all possible, use a local, in process caching solution, such as APC(u) or YAC (YAC is extremely clever software, and may not be stable).

When APC(u), or wincache actually, copy arrays and scalars in and out of shared memory, they do so bitwise, byte by byte, they do not serialize or otherwise have to change the format of the data, this, coupled with the fact there is 0 network overhead make local caching solutions like APC(u) much faster than anything like memcached or redis.

APC(u) does not have any limitations on the size of segments or entries, you should be able to configure it to consume all the memory your operating system will allow you to map.

Of course, if you have a good reason to want to send over the wire, then this is pretty useless information ;)

like image 107
Joe Watkins Avatar answered Oct 05 '22 23:10

Joe Watkins