Unix sockets slower than tcp when connected to redis

Q: Are unix sockets faster than TCP?

Unix domain sockets are often twice as fast as a TCP socket when both peers are on the same host. The Unix domain protocols are not an actual protocol suite, but a way of performing client/server communication on a single host using the same API that is used for clients and servers on different hosts.

Q: Do unix sockets use TCP?

Socket Use In PracticeUnix sockets are usually used as an alternative to network-based TCP connections when processes are running on the same machine.

Q: Is Unix domain socket reliable?

Valid socket types in the UNIX domain are: SOCK_STREAM, for a stream-oriented socket; SOCK_DGRAM, for a datagram-oriented socket that preserves message boundaries (as on most UNIX implementations, UNIX domain datagram sockets are always reliable and don't reorder datagrams); and (since Linux 2.6.

Q: Is socket communication fast?

Sockets are faster than web services in general.

Tags:

c

unix

redis

server

sockets

I'm developing high performance web server that should handle ~2k simultaneous connections and 40k QPS, achieving resp time < 7ms.

What it does is querying Redis server (running on the same host) and returning the response to the client. During the testing, I observed that implementation using TCP STREAM_SOCKETs acts way better than connecting with unix sockets. With ~1500 connections TCP stays about 8ms while unix sockets come to ~50.

Server is written in C, its based on constant Posix threads pool, I use blocking connection to Redis. My OS is CentOS 6, tests were performed using Jmeter, wrk and ab. For connection with redis I use hiredis lib that provides these two ways of connecting to Redis.
As far as I know unix socket should be at least as fast as TCP.

Does somebody have any idea what could cause such a behaviour?

487

asked Nov 14 '14 16:11

realmaniek

2 Answers

Unix Domain Sockets are generally faster than TCP sockets over the loopback interface. Generally Unix Domain Sockets have on average 2 microseconds latency whereas TCP sockets 6 microseconds.

If I run redis-benchmark with defaults (no pipeline) I see 160k requests per second, basically because the single-threaded redis server is limited by the TCP socket, 160k requests running at average response time of 6 microseconds.

Redos achieves 320k SET/GET requests per second when using Unix Domain Sockets.

But there is a limit, which in fact we, at Torusware, have reached with our product Speedus, a high-performance TCP socket implementation with an average latency of 200 nanoseconds (ping us at [email protected] for requesting the Extreme Performance version). With almost zero latency we see redis-benchmark achieving around 500k requests per second. So we can say that redis-server latency is around 2 microseconds per request on average.

If you want to answer ASAP and your load is below the peak redis-server performance then avoiding pipelining is probably the best option. However, if you want to be able to handle higher throughput, then you can handle pipelines of request. The response can take a bit longer but you will be able to process more requests on the some hardware.

Thus, on the previous scenario, with a pipeline of 32 requests (buffering 32 requests before sending the actual request through the socket) you could process up to 1 million requests per second over the loopback interface. And in this scenario is where UDS benefits are not that high, especially because processing such pipelining is the performance bottleneck. In fact, 1M requests with a pipeline of 32 is around 31k "actual" requests per second, and we have seen that redis-server is able to handle 160k requests per second.

Unix Domain Sockets handle around 1.1M and 1.7M SET/GET requests per second, respectively. TCP loopback handles 1M and 1.5 SET/GET requests per second.

With pipelining the bottleneck moves from the transport protocol to the pipeling handling.

This is in tune with the information in mentioned in the redis-benchmark site.

However, pipelining increases dramatically the response time. Thus, with no pipelining 100% of operations generally run in less than 1 millisecond. When pipelining 32 requests the maximum response time is 4 milliseconds in a high-performance server, and tens of milliseconds if redis-server runs in a different machine or in a virtual machine.

So you have to trade-off response time and maximum throughput.

163

answered Oct 21 '22 23:10

Guillermo Lopez

Although this is an old question, i would like to make an addition. Other answers talk about 500k or even 1.7M responses/s. This may be achievable with Redis, but the question was:

Client --#Network#--> Webserver --#Something#--> Redis

The webserver functions in sort of a HTML proxy to Redis i assume.

This means that your number of requests is also limited to how many requests the webserver can achieve. There is a limitation often forgotten: if you have a 100Mbit connection, you have 100.000.000 bits a second at your disposal, but default in packages of 1518 bits (including the required space after the package). This means: 65k network packages. Assuming all your responses are smaller that the data part of such a package and non have to be resend due to CRC errors or lost packages.

Also, if persistent connections are not used, you need to do an TCP/IP handshake for each connection This adds 3 packages per request (2 receive, one send). So in that unoptimised situation, you remain with 21k obtainable requests to your webserver. (or 210k for a 'perfect' gigabit connection) - if your response fits in one packet of 175 bytes.

So:

Persistent connections only require a bit of memory, so enable it. It can quadruple your performance. (best case)
Reduce your response size by using gzip/deflate if needed so they fit in as few of packets as possible. (Each packet lost is a possible response lost)
Reduce your response size by stripping unneeded 'garbage' like debug data or long xml tags.
HTTPS connections will add a huge (in comparisation here) overhead
Add network cards and trunk them
if responses are always smaller then 175 bytes, use a dedicted network card for this service and reduce the network frame size to increase the packages send each second.
don't let the server do other things (like serving normal webpages)
...

answered Oct 22 '22 01:10

phulstaert

Related questions
                            
                                Why is this happening with the sizeof operator when comparing with a negative number? [duplicate]
                            
                                Is there a generic conversion specifier for printf?
                            
                                Calling isalpha Causing Segmentation Fault
                            
                                Strange stack behavior in C
                            
                                Bit Shifts on a C Pointer?
                            
                                Implicit typecasting in C (Converting 32 bit unsigned in to 8 bit u int)
                            
                                Passing two-dimensional array via pointer
                            
                                In regards to for(), why use i++ rather than ++i?
                            
                                Check running processes in C
                            
                                Loops/timers in C
                            
                                C program to set k lower order bits
                            
                                C - expected expression before '=' token... on line without '='
                            
                                Avoiding all system messages and messages from other software
                            
                                transfer integer over a socket in C
                            
                                How to get CPU info in C on Linux, such as number of cores? [duplicate]
                            
                                How to optimize "u[0]*v[0] + u[2]*v[2]" code line with SSE or GLSL
                            
                                "undefined reference to `pow'" even with math.h and the library link -lm [duplicate]
                            
                                Is there an inline way to mix c and c++ prototypes?
                            
                                How are char* deallocated in C
                            
                                Why is 1.0f in C code represented as 1065353216 in the generated assembly?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With