Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP - Reading Directory vs Fetching from Database

my main reason is get an opinion over different options. I have files/thumbs in a directory which are associated with a video and when i have to get them i use glob() function glob(DIRECTORY./file_name*.jpg); and it returns an array of all JPG files of the video.

glob function itself is very fast but still i am concerned about usage because on each page there can be 20 to 50 videos so there will be 20 to 50 glob calls, should i keep using it or start putting data in database to fetch files list from there instead of glob(); ?

if there is a better alternative, please let me know.

Thanks.

like image 276
ArslanCb Avatar asked Aug 31 '12 13:08

ArslanCb


1 Answers

As usual with performance questions, results may vary quite a bit, so the answer is: what's faster is what works faster for you.

The place to start is to measure how much time it takes to do things as you are doing them now. Once you have done this, ask yourself: is this fast enough? It may be that, although it might not be the fastest way to do things, it's still so fast that speed is not a concern.

How much of the time processing a page is spent getting the file globs? 1%? 10%? 50%? The higher this percentage is, the more worthwhile it becomes to consider changing how you do things.

Also, how is site performance as a whole? If you doubled the speed of each page load, would people notice? If not, then it may not really be worth doing performance tuning yet even if you see the obvious place to do so.

If you think you could do better, implement the functionality using your database and measure if that is faster. Again, results with this could be highly variable. For example, if your database is under a heavy load, getting the results from the database might be much slower. If you have a massively powerful database that's barely used, it might be very fast. Only testing can tell you the truth.

I will add that the way you are doing things now seems simpler and more maintainable, because it finds filenames based on the actual files on your disk. If you try to use a database, you will have to worry about synchronizing the list of filenames in the database with the list of files in the filesystem.

One thing to be aware of, though, is that many filesystems perform worse when you have a single directory with a very large number of files in it. If you have this situation, consider splitting the files up into multiple subdirectories. A popular approach is to make directories with names a-z and then put all files beginning with "a" in the "a" directory, all files beginning with "b" in the "b" directory, etc. However, this will probably only be important once you have tens of thousands of files, and even then it depends on the particular filesystem and the hardware it runs on.

(Edit based on comments:)

Since you are talking about pre-computing the results and storing those in the database, I suggest that a better approach than putting things in a database is to use a caching server like http://memcached.org/. You can look at this as a hybrid approach: you still do things the way you are doing them now, but each time you want a result, you first check the cache to see if it contains the result; if it does, used the cached result, otherwise compute the new glob. This avoids the problem of keeping database and filesystem in sync, because old cache entries can expire and be replaced by new, correct ones.

like image 169
Nate C-K Avatar answered Sep 22 '22 19:09

Nate C-K