Where is a delay in an HTTP POST coming from?

Tags:

I am developing a web service on Ubuntu 14.04 in C++ using cpp-netlib in asynchronous mode. The service needs to respond to data sent over HTTP POST in the message body. I am observing poor performance when receiving input larger than 1K, and I want to fix that.

If the data is relatively small, less than 1K, the server receives the data almost instantly. If the data is more than 1K, there is a roughly one second delay after the asynchronous read callback function is first invoked before the first chunk of actual data is presented to the callback. After the initial delay, the subsequent chunks arrive almost instantly.

How can I eliminate this delay? Is it an artifact of using curl to post test data? How can I easily test the performance of curl for posting data?

You can find minimal source code demonstrating the problem on github. Here's the command I use to post data to the server:

rcook$ curl -d @AsyncDaemon.h http://localhost:8787/foo

Here's an example of its output (annotated):

rcook$ ./async_daemon 
1431387368.321863: AsyncDaemon constructor
1431387368.322446: receive thread beginning

*** It's waiting for a connection here.

1431387371.536191: begin transaction 0 on thread 24050
1431387371.536237: transaction 0 constructor
1431387371.536273: received 1206 byte request for /foo from 127.0.0.1:49402
1431387371.536312: invoked asynchronous read
1431387371.536321: end transaction handler
1431387371.536335: begin asynchronous read callback on thread 24050
1431387371.536348: read 0 bytes
1431387371.536386: invoked asynchronous read
1431387371.536394: end asynchronous read callback

*** The asynchronous read callback is invoked quickly, but gets no data.
*** There is then a pause of just over one second before the asynchronous
*** read callback is invoked again.

1431387372.537203: begin asynchronous read callback on thread 24050
1431387372.537253: read 1024 bytes
1431387372.537307: invoked asynchronous read
1431387372.537317: end asynchronous read callback

*** There is no significant delay when reading the next chunk.

1431387372.537429: begin asynchronous read callback on thread 24050
1431387372.537469: read 182 bytes
1431387372.537478: finished reading the body
1431387372.537746: wrote response
1431387372.537763: transaction 0 destructor
1431387372.537772: end asynchronous read callback

*** The server is then killed with a keyboard interrupt.

^C1431387375.382186: terminating with signal 2
1431387375.382231: initiating shutdown
1431387375.382241: stopping server
1431387375.382363: server run finished
1431387375.382423: receive thread ending
1431387375.382522: AsyncDaemon destructor

As you can see, after the asynchronous read callback is invoked for the first time (and receives zero bytes of data, BTW), it requests another chunk of input. At this point there is a pause of just over one second before that input arrives, from 1431387371.536394 to 1431387372.537203 in this example. What's going on during that time? How can I eliminate that delay?

I have done some research online, and run several experiments (synchronous vs. asynchronous mode with cpp-netlib (no effect), curl vs. libcurl (no effect)), but have not been able to find an answer.

Update: TCP Dump

Upon jxh's suggestion, I ran a tcp dump during a sample transaction:

00:28:01.304446 IP6 localhost.52265 > localhost.8787: Flags [S], seq 3956487146, win 43690, options [mss 65476,sackOK,TS val 395479802 ecr 0,nop,wscale 7], length 0
00:28:01.304461 IP6 localhost.8787 > localhost.52265: Flags [R.], seq 0, ack 3956487147, win 0, length 0
00:28:01.305014 IP localhost.49421 > localhost.8787: Flags [S], seq 1668603425, win 43690, options [mss 65495,sackOK,TS val 395479803 ecr 0,nop,wscale 7], length 0
00:28:01.305039 IP localhost.8787 > localhost.49421: Flags [S.], seq 4010788604, ack 1668603426, win 43690, options [mss 65495,sackOK,TS val 395479803 ecr 395479803,nop,wscale 7], length 0
00:28:01.305079 IP localhost.49421 > localhost.8787: Flags [.], ack 1, win 342, options [nop,nop,TS val 395479803 ecr 395479803], length 0
00:28:01.305185 IP localhost.49421 > localhost.8787: Flags [P.], seq 1:176, ack 1, win 342, options [nop,nop,TS val 395479803 ecr 395479803], length 175
00:28:01.305210 IP localhost.8787 > localhost.49421: Flags [.], ack 176, win 350, options [nop,nop,TS val 395479803 ecr 395479803], length 0
00:28:02.306555 IP localhost.49421 > localhost.8787: Flags [P.], seq 176:1382, ack 1, win 342, options [nop,nop,TS val 395480053 ecr 395479803], length 1206
00:28:02.306620 IP localhost.8787 > localhost.49421: Flags [.], ack 1382, win 1373, options [nop,nop,TS val 395480053 ecr 395480053], length 0
00:28:02.307223 IP localhost.8787 > localhost.49421: Flags [P.], seq 1:52, ack 1382, win 1373, options [nop,nop,TS val 395480053 ecr 395480053], length 51
00:28:02.307270 IP localhost.49421 > localhost.8787: Flags [.], ack 52, win 342, options [nop,nop,TS val 395480053 ecr 395480053], length 0
00:28:02.307494 IP localhost.8787 > localhost.49421: Flags [P.], seq 52:66, ack 1382, win 1373, options [nop,nop,TS val 395480053 ecr 395480053], length 14
00:28:02.307522 IP localhost.49421 > localhost.8787: Flags [.], ack 66, win 342, options [nop,nop,TS val 395480053 ecr 395480053], length 0
00:28:02.307765 IP localhost.8787 > localhost.49421: Flags [F.], seq 66, ack 1382, win 1373, options [nop,nop,TS val 395480053 ecr 395480053], length 0
00:28:02.307867 IP localhost.49421 > localhost.8787: Flags [F.], seq 1382, ack 67, win 342, options [nop,nop,TS val 395480053 ecr 395480053], length 0
00:28:02.307917 IP localhost.8787 > localhost.49421: Flags [.], ack 1383, win 1373, options [nop,nop,TS val 395480053 ecr 395480053], length 0

I'm not very experienced with tcpdump, but it looks like 175 bytes flow to the server (HTTP headers?), then after a delay of a little over one second, 1206 bytes flow to the server, followed by a 51 byte chunk with minimal delay, followed by the server response.

This tells me that the delay is being introduced on the client side, likely in curl. Does anyone know why?

994

asked May 11 '15 23:05

Randall Cook

1 Answers

Problem solved, thanks to debugging and diagnostic techniques suggested by @jxh.

Adding --trace - --trace-time to the curl command revealed that curl was spending that mysterious second waiting for the server to return a 100 Continue response before it sent the rest of the request:

01:31:44.043611 == Info: Connected to localhost (127.0.0.1) port 8787 (#0)
01:31:44.043726 => Send header, 175 bytes (0xaf)
0000: 50 4f 53 54 20 2f 66 6f 6f 20 48 54 54 50 2f 31 POST /foo HTTP/1
0010: 2e 31 0d 0a 55 73 65 72 2d 41 67 65 6e 74 3a 20 .1..User-Agent: 
0020: 63 75 72 6c 2f 37 2e 33 35 2e 30 0d 0a 48 6f 73 curl/7.35.0..Hos
0030: 74 3a 20 6c 6f 63 61 6c 68 6f 73 74 3a 38 37 38 t: localhost:878
0040: 37 0d 0a 41 63 63 65 70 74 3a 20 2a 2f 2a 0d 0a 7..Accept: */*..
0050: 43 6f 6e 74 65 6e 74 2d 4c 65 6e 67 74 68 3a 20 Content-Length: 
0060: 31 32 30 36 0d 0a 43 6f 6e 74 65 6e 74 2d 54 79 1206..Content-Ty
0070: 70 65 3a 20 61 70 70 6c 69 63 61 74 69 6f 6e 2f pe: application/
0080: 78 2d 77 77 77 2d 66 6f 72 6d 2d 75 72 6c 65 6e x-www-form-urlen
0090: 63 6f 64 65 64 0d 0a 45 78 70 65 63 74 3a 20 31 coded..Expect: 1
00a0: 30 30 2d 63 6f 6e 74 69 6e 75 65 0d 0a 0d 0a    00-continue....
01:31:45.045626 == Info: Done waiting for 100-continue
01:31:45.045831 => Send data, 1206 bytes (0x4b6)

It is a known deficiency in cpp-netlib (at least as of version 0.11.0) that it does not support sending the 100 Continue response that curl is expecting.

The solution then became convincing curl to not wait for a 100 Continue response. As I discovered here, adding -H 'Expect:' to the curl command line does the trick. With that, the entire transaction takes about a millisecond.

Since I answered my own question, I won't accept my answer for a few weeks to give others a chance to contribute something better.

166

answered Sep 16 '22 12:09

Randall Cook

Related questions
                            
                                How to self-document a callback function that is called by template library class?
                            
                                Is it allowed to cast away const on a const-defined object as long as it is not actually modified?
                            
                                Creating a new vector using a transform
                            
                                Can a reference type be used as the key type in an STL map
                            
                                How do you extend python with C++?
                            
                                Is reinterpret_cast mostly useless?
                            
                                Tracking C++ lib public API changes
                            
                                Ambiguous injected class name is not an error
                            
                                Setting width in C++ output stream
                            
                                "using namespace" statement inside an anonymous namespace
                            
                                What's the recommended Eclipse CDT configuration for big C++ project (indexer takes forever)
                            
                                C++11 atomics and intrusive shared pointer reference count
                            
                                Is it possible to develop a loadable kernel module (LKM) on Linux with C++?
                            
                                Compiling of SQlite3 in C++
                            
                                Pattern Matching - Find reference object in second image [OpenCV?]
                            
                                How to guarantee order of argument evaluation when calling a function object?
                            
                                how to specify vc11 lambda calling convention
                            
                                Event notification without mutex
                            
                                Why unique-ptr doesn't check base class to virtual destructible?
                            
                                Time complexity of removing items in vectors and deque

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Where is a delay in an HTTP POST coming from?

Tags:

c++

performance

curl

network-programming

cpp-netlib

Update: TCP Dump

Randall Cook

People also ask

1 Answers

Randall Cook

Recent Activity

Donate For Us