Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Accessing Big Arrays in PHP

I've been doing some profiling on different methods of accessing large(ish) arrays of data in PHP. The use case is pretty simple: some of our tools output data into PHP files as associative arrays and these files are considered static data by the application. We make games, so some examples of data files would include items in a catalog, tasks that a user must complete, or definitions for maps:

<?php
$some_data = array(
    ...lots and lots of stuff in here...
);
?>

Since these arrays are large-ish(400K), and much of our code is interested in this data, it becomes necessary to access this data as efficiently as possible. I settled on timing 3 different patterns for doing this. After presenting the methods I will share my results below.

What I'm looking for is some experience based validation on these methods and their timing as well as any other methods to try out.

Method #1: getter function

In the method, the exporter actually creates a file that looks like:

<?php
function getSomeData()
{
    $some_data = array(
        ...lots and lots of stuff here...
    );
    return $some_data;
}
?>

Client code can then get the data by simply calling getSomeData() when they want it.

Method #2: global + include

In this method the data file looks identical to the original code block above, however the client code must jump through a few hoops to get the data into a local scope. This assumes the array is in a file called 'some_data.php';

global $some_data; //must be the same name as the variable in the data file...
include 'some_data.php';

This will bring the $some_data array into scope, though it is a bit cumbersome for client code (my opinion).

Method #3: getter by reference

This method is nearly identical to Method #1, however the getter function does not return a value but rather sets a reference to the data.

<?php
function getSomeDataByRef($some_data)
{
    $some_data = array(
        ...lots and lots of stuff here...
    );
    return $some_data;
}
?>

Client code then retrieves the data by declaring a local variable (called anything) and passing it by reference to the getter:

$some_data_anyname = array();
getSomeDataByRef(&$some_data_anyname);

Results

So I ran a little script that runs each of these methods of retrieving data 1000 times on and averages the run time (computed by microtime(true) at the beginning and end). The following are my results (in ms, running on a MacBookPro 2GHz, 8GB RAM, PHP version 5.3.4):

METHOD #1:

AVG: 0.0031637034416199 MAX: 0.0043289661407471 MIN: 0.0025908946990967

METHOD #2:

AVG: 0.01434082698822 MAX: 0.018275022506714 MIN: 0.012722969055176

METHOD #3:

AVG: 0.00335768699646 MAX: 0.0043489933013916 MIN: 0.0029017925262451

It seems pretty clear, from this data anyway, that the global+include method is inferior to the other two, which are "negligible" difference.

Thoughts? Am I completely missing anything? (probably...)

Thanks in advance!

like image 915
cstays Avatar asked May 03 '12 15:05

cstays


2 Answers

Not sure if this is exactly what your looking for but it should help out with speed and memory issues. You can use the fixed spl array:

$startMemory = memory_get_usage();
$array = new SplFixedArray(100000);
for ($i = 0; $i < 100000; ++$i) {
    $array[$i] = $i;
}
echo memory_get_usage() - $startMemory, ' bytes';

Read more on big php arrays here: http://nikic.github.com/2011/12/12/How-big-are-PHP-arrays-really-Hint-BIG.html

Also have you thought about storing the data in a cache/memory? For example you could use mysqlite with the inmemory engine on the first execution then access data from there:

$pdo = new PDO('sqlite::memory:');
$pdo->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
// .. Use PDO as normal
like image 83
Eddie Avatar answered Oct 19 '22 11:10

Eddie


For one of my projects in which a database was not an option, I faced the same problem of loading big (by big I mean series of 3 MB files) php files containing arrays in memory and I was looking for options to maximize the performances. I found a very easy one which was caching these files on the disk as json at first use. I divided load time by 3 and also memory peak consumption by 30%. Loading local json file with json_decode() is much much faster than including a big php file containing an array. it also has the advantage of being a format that most languages can manipulate directly. Hope that helps.

like image 43
Pascalc Avatar answered Oct 19 '22 10:10

Pascalc