Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does facebook use TCP for SET and UDP for GET in memcached

My question is regarding memcached. Facebook uses memcached as a cache for their structured data to reduce the latency for their users. They have optimized performance of memcached with UDP on linux. http://www.facebook.com/note.php?note_id=39391378919

But interestingly they still use TCP for set operations but use UDP for get operation.

Why would they be doing so? I mean why not use UDP for set operation also? UDP scales better than TCP because of reduced states that need to be maintained in the operating system.

Thanks,

like image 293
0xhacker Avatar asked Jul 10 '12 03:07

0xhacker


2 Answers

This sentence pretty much uncovers the problem and the solution:

Although we improved the memory efficiency with TCP, we moved to UDP for get operations to reduce network traffic and implement application-level flow control for multi-gets (gets of hundreds of keys in parallel).

TCP is also flow control and in case of Memcache multi-gets it is pretty serial. You open the connection (or pool it), query list of keys, wait and then get result with list of all values. Instead they implemented the application-level flow control themselves on top of connection-less parallel UDP gets. Here are benefits of UDP I see for FB sized software:

  • no need to open connections, pooling them, doing additional round-trips, sessions, handshakes, keep-alives and so on;
  • multiple distributed Memcache servers and indexes can be queried in parallel which is fine in the spirit of micro-services and "micro-caches" as services;
  • can multicast UDP packet to provide high-availability with redundancy, load balancing, dynamic routing or even sharding - the first response wins!
  • individual get time-out and retry policy may be implemented on application level;
  • the logic can be executed as soon as any partially complete data is available - no need to wait for the complete multi-get result;

On the other hand I think they do writes over TCP for consistency. TCP with memcached provides a transaction where request is sent and then response acknowledges the cache update. Reimplementing that in UDP wouldn't provide much benefits I suppose.

like image 158
gertas Avatar answered Nov 07 '22 21:11

gertas


Each UDP datagram contains a simple frame header, followed by data in the same format as the TCP protocol described above. In the current implementation, requests must be contained in a single UDP datagram, but responses may span several datagrams. (The only common requests that would span multiple datagrams are huge multi-key get requests and set requests, both of which are more suitable to TCP transport for reliability reasons anyway.)

https://github.com/memcached/memcached/blob/master/doc/protocol.txt

like image 36
Chen Fei Avatar answered Nov 07 '22 19:11

Chen Fei