My question is regarding memcached. Facebook uses memcached as a cache for their structured data to reduce the latency for their users. They have optimized performance of memcached with UDP on linux. http://www.facebook.com/note.php?note_id=39391378919
But interestingly they still use TCP for set operations but use UDP for get operation.
Why would they be doing so? I mean why not use UDP for set operation also? UDP scales better than TCP because of reduced states that need to be maintained in the operating system.
Thanks,
This sentence pretty much uncovers the problem and the solution:
Although we improved the memory efficiency with TCP, we moved to UDP for get operations to reduce network traffic and implement application-level flow control for multi-gets (gets of hundreds of keys in parallel).
TCP is also flow control and in case of Memcache multi-gets it is pretty serial. You open the connection (or pool it), query list of keys, wait and then get result with list of all values. Instead they implemented the application-level flow control themselves on top of connection-less parallel UDP gets. Here are benefits of UDP I see for FB sized software:
On the other hand I think they do writes over TCP for consistency. TCP with memcached provides a transaction where request is sent and then response acknowledges the cache update. Reimplementing that in UDP wouldn't provide much benefits I suppose.
Each UDP datagram contains a simple frame header, followed by data in the
same format as the TCP protocol described above. In the current
implementation, requests must be contained in a single UDP datagram, but
responses may span several datagrams. (The only common requests that would
span multiple datagrams are huge multi-key get
requests and set
requests, both of which are more suitable to TCP transport for reliability
reasons anyway.)
https://github.com/memcached/memcached/blob/master/doc/protocol.txt
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With