I've always heard and searched for new php 'good writing practice', for example: It's better (for performance) to check if array key exists than search in array, but also it seems better for memory too:
Assuming we have:
$array = array ( 'one' => 1, 'two' => 2, 'three' => 3, 'four' => 4, );
this allocates 1040 bytes of memory,
and
$array = array ( 1 => 'one', 2 => 'two', 3 => 'three', 4 => 'four', );
requires 1136 bytes
I understand that the key
and value
surely will have different storing mechanism, but please can you actually point me to the principle how does it work?
Example 2 (for @teuneboon):
$array = array ( 'one' => '1', 'two' => '2', 'three' => '3', 'four' => '4', );
1168 bytes
$array = array ( '1' => 'one', '2' => 'two', '3' => 'three', '4' => 'four', );
1136 bytes
consuming same memory:
4 => 'four',
'4' => 'four',
PHP memory management functions are invoked by the MySQL Native Driver through a lightweight wrapper. Among others, the wrapper makes debugging easier. The various MySQL Server and the various client APIs differentiate between buffered and unbuffered result sets.
The default value is 128MB . Often, this is raised depending on the amount of memory needed for the web application. Be aware that the server has a physical memory limit. You should optimize your code if the memory_limit parameter is already set too high.
Memory Limit is Not The Same as RAM While RAM is the total available memory, the memory limit is per PHP process. That means that your site can consume e.g 10 GB of RAM, with a memory limit of 256 MB.
unset() does just what its name says - unset a variable. It does not force immediate memory freeing. PHP's garbage collector will do it when it see fits - by intention as soon, as those CPU cycles aren't needed anyway, or as late as before the script would run out of memory, whatever occurs first.
Note, answer below is applicable for PHP prior to version 7 as in PHP 7 major changes were introduced which also involve values structures.
Your question is not actually about "how memory works in PHP" (here, I assume, you meant "memory allocation"), but about "how arrays work in PHP" - and these two questions are different. To summarize what's written below:
ulong
(unsigned long) keys hash-map, so real difference will be in values, where string-keys option has integer (fixed-length) values, while integer-keys option has strings (chars-dependent length) values. But that may not always will be true due to possible collisions.'4'
, will be treated as integer keys and translated into integer hash result as it was integer key. Thus, '4'=>'foo'
and 4 => 'foo'
are same things.Also, important note: the graphics here are copyright of PHP internals book
PHP arrays and C arrays
You should realize one very important thing: PHP is written on C, where such things as "associative array" simply does not exist. So, in C "array" is exactly what "array" is - i.e. it's just a consecutive area in memory which can be accessed by a consecutive offset. Your "keys" may be only numeric, integer and only consecutive, starting from zero. You can't have, for instance, 3
,-6
,'foo'
as your "keys" there.
So to implement arrays, which are in PHP, there's hash-map option, it uses hash-function to hash your keys and transform them to integers, which can be used for C-arrays. That function, however, will never be able to create a bijection between string keys and their integer hashed results. And it's easy to understand why: because cardinality of strings set is much, much larger that cardinality of integer set. Let's illustrate with example: we'll recount all strings, up to length 10, which have only alphanumeric symbols (so, 0-9
, a-z
and A-Z
, total 62): it's 6210 total strings possible. It's around 8.39E+17. Compare it with around 4E+9 which we have for unsigned integer (long integer, 32-bits) type and you'll get the idea - there will be collisions.
PHP hash-map keys & collisions
Now, to resolve collisions, PHP will just place items, which have same hash-function result, into one linked list. So, hash-map would not be just "list of hashed elements", but instead it will store pointers to lists of elements (each element in certain list will have same hash-function key). And this is where you have point to how it will affect memory allocation: if your array has string keys, which did not result in collisions, then no additional pointers inside those list would be needed, so memory amount will be reduced (actually, it's a very small overhead, but, since we're talking about precise memory allocation, this should be taken to account). And, same way, if your string keys will result into many collisions, then more additional pointers would be created, so total memory amount will be a bit more.
To illustrate those relations within those lists, here's a graphic:
Above there is how PHP will resolve collisions after applying hash-function. So one of your question parts lies here, pointers inside collision-resolution lists. Also, elements of linked lists are usually called buckets and the array, which contains pointers to heads of those lists is internally called arBuckets
. Due to structure optimization (so, to make such things as element deletion, faster), real list element has two pointers, previous element and next element - but that's only will make difference in memory amount for non-collision/collision arrays little wider, but won't change concept itself.
One more list: order
To fully support arrays as they are in PHP, it's also needed to maintain order, so that is achieved with another internal list. Each element of arrays is a member of that list too. It won't make difference in terms of memory allocation, since in both options this list should be maintained, but for full picture, I'm mentioning this list. Here's the graphic:
In addition to pListLast
and pListNext
, pointers to order-list head and tail are stored. Again, it's not directly related to your question, but further I'll dump internal bucket structure, where these pointers are present.
Array element from inside
Now we're ready to look into: what is array element, so, bucket:
typedef struct bucket { ulong h; uint nKeyLength; void *pData; void *pDataPtr; struct bucket *pListNext; struct bucket *pListLast; struct bucket *pNext; struct bucket *pLast; char *arKey; } Bucket;
Here we are:
h
is an integer (ulong) value of key, it's a result of hash-function. For integer keys it is just same as key itself (hash-function returns itself)pNext
/ pLast
are pointers inside collision-resolution linked listpListNext
/pListLast
are pointers inside order-resolution linked listpData
is a pointer to the stored value. Actually, value isn't same as inserted at array creation, it's copy, but, to avoid unnecessary overhead, PHP uses pDataPtr
(so pData = &pDataPtr
)From this viewpoint, you may get next thing to where difference is: since string key will be hashed (thus, h
is always ulong
and, therefore, same size), it will be a matter of what is stored in values. So for your string-keys array there will be integer values, while for integer-keys array there will be string values, and that makes difference. However - no, it isn't a magic: you can't "save memory" with storing string keys such way all the times, because if your keys would be large and there will be many of them, it will cause collisions overhead (well, with very high probability, but, of course, not guaranteed). It will "work" only for arbitrary short strings, which won't cause many collisions.
Hash-table itself
It's already been spoken about elements (buckets) and their structure, but there's also hash-table itself, which is, in fact, array data-structure. So, it's called _hashtable
:
typedef struct _hashtable { uint nTableSize; uint nTableMask; uint nNumOfElements; ulong nNextFreeElement; Bucket *pInternalPointer; /* Used for element traversal */ Bucket *pListHead; Bucket *pListTail; Bucket **arBuckets; dtor_func_t pDestructor; zend_bool persistent; unsigned char nApplyCount; zend_bool bApplyProtection; #if ZEND_DEBUG int inconsistent; #endif } HashTable;
I won't describe all the fields, since I've already provided much info, which is only related to the question, but I'll describe this structure briefly:
arBuckets
is what was described above, the buckets storage,pListHead
/pListTail
are pointers to order-resolution listnTableSize
determines size of hash-table. And this is directly related to memory allocation: nTableSize
is always power of 2. Thus, it's no matter if you'll have 13 or 14 elements in array: actual size will be 16. Take that to account when you want to estimate array size.It's really difficult to predict, will one array be larger than another in your case. Yes, there are guidelines which are following from internal structure, but if string keys are comparable by their length to integer values (like 'four'
, 'one'
in your sample) - real difference will be in such things as - how many collisions occurred, how many bytes were allocated to save the value.
But choosing proper structure should be matter of sense, not memory. If your intention is to build the corresponding indexed data, then choice always be obvious. Post above is only about one goal: to show how arrays actually work in PHP and where you can find the difference in memory allocation in your sample.
You may also check article about arrays & hash-tables in PHP: it's Hash-tables in PHP by PHP internals book: I've used some graphics from there. Also, to realize, how values are allocated in PHP, check zval Structure article, it may help you to understand, what will be differences between strings & integers allocation for values of your arrays. I didn't include explanations from it here, since much more important point for me - is to show array data structure and what may be difference in context of string keys/integer keys for your question.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With