I am developing a web service on Ubuntu 14.04 in C++ using cpp-netlib in asynchronous mode. The service needs to respond to data sent over HTTP POST in the message body. I am observing poor performance when receiving input larger than 1K, and I want to fix that.
If the data is relatively small, less than 1K, the server receives the data almost instantly. If the data is more than 1K, there is a roughly one second delay after the asynchronous read callback function is first invoked before the first chunk of actual data is presented to the callback. After the initial delay, the subsequent chunks arrive almost instantly.
How can I eliminate this delay? Is it an artifact of using curl to post test data? How can I easily test the performance of curl for posting data?
You can find minimal source code demonstrating the problem on github. Here's the command I use to post data to the server:
rcook$ curl -d @AsyncDaemon.h http://localhost:8787/foo
Here's an example of its output (annotated):
rcook$ ./async_daemon
1431387368.321863: AsyncDaemon constructor
1431387368.322446: receive thread beginning
*** It's waiting for a connection here.
1431387371.536191: begin transaction 0 on thread 24050
1431387371.536237: transaction 0 constructor
1431387371.536273: received 1206 byte request for /foo from 127.0.0.1:49402
1431387371.536312: invoked asynchronous read
1431387371.536321: end transaction handler
1431387371.536335: begin asynchronous read callback on thread 24050
1431387371.536348: read 0 bytes
1431387371.536386: invoked asynchronous read
1431387371.536394: end asynchronous read callback
*** The asynchronous read callback is invoked quickly, but gets no data.
*** There is then a pause of just over one second before the asynchronous
*** read callback is invoked again.
1431387372.537203: begin asynchronous read callback on thread 24050
1431387372.537253: read 1024 bytes
1431387372.537307: invoked asynchronous read
1431387372.537317: end asynchronous read callback
*** There is no significant delay when reading the next chunk.
1431387372.537429: begin asynchronous read callback on thread 24050
1431387372.537469: read 182 bytes
1431387372.537478: finished reading the body
1431387372.537746: wrote response
1431387372.537763: transaction 0 destructor
1431387372.537772: end asynchronous read callback
*** The server is then killed with a keyboard interrupt.
^C1431387375.382186: terminating with signal 2
1431387375.382231: initiating shutdown
1431387375.382241: stopping server
1431387375.382363: server run finished
1431387375.382423: receive thread ending
1431387375.382522: AsyncDaemon destructor
As you can see, after the asynchronous read callback is invoked for the first time (and receives zero bytes of data, BTW), it requests another chunk of input. At this point there is a pause of just over one second before that input arrives, from 1431387371.536394 to 1431387372.537203 in this example. What's going on during that time? How can I eliminate that delay?
I have done some research online, and run several experiments (synchronous vs. asynchronous mode with cpp-netlib (no effect), curl vs. libcurl (no effect)), but have not been able to find an answer.
Upon jxh's suggestion, I ran a tcp dump during a sample transaction:
00:28:01.304446 IP6 localhost.52265 > localhost.8787: Flags [S], seq 3956487146, win 43690, options [mss 65476,sackOK,TS val 395479802 ecr 0,nop,wscale 7], length 0
00:28:01.304461 IP6 localhost.8787 > localhost.52265: Flags [R.], seq 0, ack 3956487147, win 0, length 0
00:28:01.305014 IP localhost.49421 > localhost.8787: Flags [S], seq 1668603425, win 43690, options [mss 65495,sackOK,TS val 395479803 ecr 0,nop,wscale 7], length 0
00:28:01.305039 IP localhost.8787 > localhost.49421: Flags [S.], seq 4010788604, ack 1668603426, win 43690, options [mss 65495,sackOK,TS val 395479803 ecr 395479803,nop,wscale 7], length 0
00:28:01.305079 IP localhost.49421 > localhost.8787: Flags [.], ack 1, win 342, options [nop,nop,TS val 395479803 ecr 395479803], length 0
00:28:01.305185 IP localhost.49421 > localhost.8787: Flags [P.], seq 1:176, ack 1, win 342, options [nop,nop,TS val 395479803 ecr 395479803], length 175
00:28:01.305210 IP localhost.8787 > localhost.49421: Flags [.], ack 176, win 350, options [nop,nop,TS val 395479803 ecr 395479803], length 0
00:28:02.306555 IP localhost.49421 > localhost.8787: Flags [P.], seq 176:1382, ack 1, win 342, options [nop,nop,TS val 395480053 ecr 395479803], length 1206
00:28:02.306620 IP localhost.8787 > localhost.49421: Flags [.], ack 1382, win 1373, options [nop,nop,TS val 395480053 ecr 395480053], length 0
00:28:02.307223 IP localhost.8787 > localhost.49421: Flags [P.], seq 1:52, ack 1382, win 1373, options [nop,nop,TS val 395480053 ecr 395480053], length 51
00:28:02.307270 IP localhost.49421 > localhost.8787: Flags [.], ack 52, win 342, options [nop,nop,TS val 395480053 ecr 395480053], length 0
00:28:02.307494 IP localhost.8787 > localhost.49421: Flags [P.], seq 52:66, ack 1382, win 1373, options [nop,nop,TS val 395480053 ecr 395480053], length 14
00:28:02.307522 IP localhost.49421 > localhost.8787: Flags [.], ack 66, win 342, options [nop,nop,TS val 395480053 ecr 395480053], length 0
00:28:02.307765 IP localhost.8787 > localhost.49421: Flags [F.], seq 66, ack 1382, win 1373, options [nop,nop,TS val 395480053 ecr 395480053], length 0
00:28:02.307867 IP localhost.49421 > localhost.8787: Flags [F.], seq 1382, ack 67, win 342, options [nop,nop,TS val 395480053 ecr 395480053], length 0
00:28:02.307917 IP localhost.8787 > localhost.49421: Flags [.], ack 1383, win 1373, options [nop,nop,TS val 395480053 ecr 395480053], length 0
I'm not very experienced with tcpdump, but it looks like 175 bytes flow to the server (HTTP headers?), then after a delay of a little over one second, 1206 bytes flow to the server, followed by a 51 byte chunk with minimal delay, followed by the server response.
This tells me that the delay is being introduced on the client side, likely in curl. Does anyone know why?
There is no way simple way to delay program execution without holding up a thread. You could in theory set up a a delay at the other server where the Re-Direct is occurring, but if you are just trying to cause a delay or timeout prior to the Redirect, you'll have to pay the penalty of a waiting thread.
HTTP uses TCP as a transport layer. Certain design features of HTTP interact badly with TCP, causing problems with performance and with server scalability. Latency problems are caused by opening a single connection per request, through connection setup and slow-start costs.
HTTP works as a request-response protocol between a client and server. Example: A client (browser) sends an HTTP request to the server; then the server returns a response to the client. The response contains status information about the request and may also contain the requested content.
Statistical analysis of page load speed data collected using the Navigation Timing API shows that an HTTP request can be reasonably approximated to 0.5 seconds.
Problem solved, thanks to debugging and diagnostic techniques suggested by @jxh.
Adding --trace - --trace-time
to the curl command revealed that curl was spending that mysterious second waiting for the server to return a 100 Continue response before it sent the rest of the request:
01:31:44.043611 == Info: Connected to localhost (127.0.0.1) port 8787 (#0)
01:31:44.043726 => Send header, 175 bytes (0xaf)
0000: 50 4f 53 54 20 2f 66 6f 6f 20 48 54 54 50 2f 31 POST /foo HTTP/1
0010: 2e 31 0d 0a 55 73 65 72 2d 41 67 65 6e 74 3a 20 .1..User-Agent:
0020: 63 75 72 6c 2f 37 2e 33 35 2e 30 0d 0a 48 6f 73 curl/7.35.0..Hos
0030: 74 3a 20 6c 6f 63 61 6c 68 6f 73 74 3a 38 37 38 t: localhost:878
0040: 37 0d 0a 41 63 63 65 70 74 3a 20 2a 2f 2a 0d 0a 7..Accept: */*..
0050: 43 6f 6e 74 65 6e 74 2d 4c 65 6e 67 74 68 3a 20 Content-Length:
0060: 31 32 30 36 0d 0a 43 6f 6e 74 65 6e 74 2d 54 79 1206..Content-Ty
0070: 70 65 3a 20 61 70 70 6c 69 63 61 74 69 6f 6e 2f pe: application/
0080: 78 2d 77 77 77 2d 66 6f 72 6d 2d 75 72 6c 65 6e x-www-form-urlen
0090: 63 6f 64 65 64 0d 0a 45 78 70 65 63 74 3a 20 31 coded..Expect: 1
00a0: 30 30 2d 63 6f 6e 74 69 6e 75 65 0d 0a 0d 0a 00-continue....
01:31:45.045626 == Info: Done waiting for 100-continue
01:31:45.045831 => Send data, 1206 bytes (0x4b6)
It is a known deficiency in cpp-netlib (at least as of version 0.11.0) that it does not support sending the 100 Continue response that curl is expecting.
The solution then became convincing curl to not wait for a 100 Continue response. As I discovered here, adding -H 'Expect:'
to the curl command line does the trick. With that, the entire transaction takes about a millisecond.
Since I answered my own question, I won't accept my answer for a few weeks to give others a chance to contribute something better.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With