hypothetically - if I have multiple memcached servers like this:
//PHP
$MEMCACHE_SERVERS = array(
"10.1.1.1", //web1
"10.1.1.2", //web2
"10.1.1.3", //web3
);
$memcache = new Memcache();
foreach($MEMCACHE_SERVERS as $server){
$memcache->addServer ( $server );
}
And then I set data like this:
$huge_data_for_frong_page = 'some data blah blah blah';
$memcache->set("huge_data_for_frong_page", $huge_data_for_frong_page);
And then I retrieve data like this:
$huge_data_for_frong_page = $memcache->get("huge_data_for_frong_page");
When i would to retrieve this data from memcached servers - how would php memcached client know which server to query for this data? Or is memcached client going to query all memcached servers?
Well you could write books about that but the basic principle is that there are some different approaches.
The most common and senseful approach for caching is sharding. Which means the data is stored only on one server and some method is used to determining which server this is. So it can be fetched from this very server and only one server is involved.
This obviously works well in key/value environments as memcached.
A common practice is to take a cryptographical hash of the key. Calculate this hash MOD number of servers and the result is the server you will store and fetch the data.
This procedure produces more or less equal balancing.
How it’s exactly done in memcached I don't know. But some sort of hash for sure.
But beware that this technique is not highly available. So if one server fails the entries are gone. So you obviously can only use this for caching purposes.
Other techniques, where for example high availability of resources is necessary, that take long to calculate and are automatically warmed up in the background, involve replication.
The most common form in caching environments is master-master replication with latest-timestamp conflict resolving. Which basically means every server gets the data from everyserver that is not yet on the local server (this is done using replication logs and byte offsets). If there is a conflict the latest version is used (the slight time offset between servers is ignored).
But in other environments where for examply only very little is written but a lot is read there is often a cascade where only one or few master servers are involved and the rest is just pure read replication.
But these setups are very rare because sharding as described above gives the best performance and in caching environments data loss is mostly tolerable. So it’s also default for memcached.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With