Small remark
Reading about max_input_vars variable made me to read a lot about PHP's internals for handling arrays. This is not really a question, but rather answering my own question "why do we really need this max_input_var". It is not localized, and actually related to a lot of other programming languages and not only php.
A problem:
Compare these two small php scripts:
$data = array();
for ($key = 0; $key <= 1073709056; $key += 32767){
$data[$key] = 0;
}
Can check it here. Everything normal, nothing unexpected. Execution time is close to 0.
And this mostly identical (difference is in 1)
$data = array();
for ($key = 0; $key <= 1073709056; $key += 32768){
$data[$key] = 0;
}
Check it here. Nothing is normal, everything is unexpected. You exceeded execution time. So it is at least 3000 times slower!
The question is why does it happen?
I posted it here together with an answer because this vastly improved my knowledge about php internals and I learned new things about security.
The problem is not in the loop, the problem is with how PHP and many other languages (Java, Python, ASP.Net) are storing key/value pairs in hash data structures. PHP uses hash table to store arrays (which makes them theoretically very fast for storing and retrieving data from that array O(1)
). The problem arise when more than one values get mapped to the same key thus creating hash collisions. Inserting element into such a key becomes more expensive O(n)
and therefore inserting n keys jumps from O(n) to O(n^2).
And this is exactly what goes on here. When the number changes from 32767
to 32768
it changes keys from no collisions to everything collides to the same key.
This is the case, because of the way it php arrays are implemented in C. Array is of the size of the power of 2. (Array of 9
and 15
elements will be allocated with array of size 16
). Also if the array key is an integer, the hash will be an integer with a mask on top of it. The mask is size of the array - 1
in binary. This means that if someone will try to insert the following keys in the associative array 0, 32, 64, 128, 256, ...
and so on, they all will be mapped to the same key and thus the hash will have a linked list. The above example creates exactly this.
This requires a lot of CPU to process and therefore you see huge time increase. What this means is that devs should be really careful when they accept some data from outside that they will be parsed into array (people can craft the data easily and DOS the server). This data can be $_GET
, $_POST
requests (this is why you can limit the number with max_input_vars
), XML
, JSON
.
Here are the resources I used to learn about these things:
I don't know anything about php specifically but 32767 would be the max value of a 2 byte number. Increasing it to 32768 would make the use of a 3 byte number (which is never used so it'll be 4 bytes) necessary which would in turn make everything slower.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With