This program prints 65k bytes per line.
I measure the throughput with ./a.out | pv >/dev/null
and get around 3 GB/s.
As soon as I change the line length to 70k, the throughput drops to ~ 1 GB/s.
Which bottleneck (CPU cache, libc idiosynchrasy, etc.) am I hitting here?
#include <stdio.h>
#include <string.h>
#define LEN 65000 // high throughput
// #define LEN 70000 // low throughput
int main ()
{
char s[LEN]; memset(s, 'a', LEN-1); s[LEN-1] = '\0';
while (1)
printf ("%s\n", s);
}
Update: I'm running this on Ubuntu 12.04 64-bit, which has EGLIBC 2.15, on a Core i5-2520M.
Update: puts (s)
has the same problem.
You are suffering from under utilizing the kernel I/O buffer in your data transfer. If we assume 64KB is the kernel I/O buffer size, then a 70000 write will block after 64KB is written. When it is drained the remaining 4KB + change is written into the I/O buffer. pv
ends up doing two reads to read each 70000
bytes transfered, resulting in about half your normal throughput due to bad buffer utilization. The stall in I/O during write probably makes up the rest.
You can specify a smaller read size to pv
, and this will increase your throughput, by increasing your average bytes transferred per time slice. Writes will be more efficient on average, and keeps read buffers full.
$ ./a.out | pv -B 70000 > /dev/null
9.25GB 0:00:09 [1.01GB/s] [ <=> ]
$ ./a.out | pv -B 30k > /dev/null
9.01GB 0:00:05 [1.83GB/s] [ <=> ]
Edit: Three more runs (2.7GHz core i7)
$ ./a.out | pv -B 16k > /dev/null
15GB 0:00:08 [1.95GB/s] [ <=> ]
$ ./a.out | pv -B 16k > /dev/null
9.3GB 0:00:05 [1.85GB/s] [ <=> ]
$ ./a.out | pv -B 16k > /dev/null
19.2GB 0:00:11 [1.82GB/s] [ <=> ]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With