<p>I tried to print <code>Hello World</code> 200,000 times and it took me forever, so I have to stop. But right after I add a <code>char</code> array to act as a buffer, it took less than 10 seconds. Why?</p> <p>Before adding a buffer:</p> <pre class="prettyprint"><code>#include <iostream> using namespace std; int main() { int count = 0; std::ios_base::sync_with_stdio(false); for(int i = 1; i < 200000; i++) { cout << "Hello world!\n"; count++; } cout<<"Count:%d\n"<<count; return 0; } </code></pre> <p>And this is after adding a buffer:</p> <pre class="prettyprint"><code>#include <iostream> using namespace std; int main() { int count = 0; std::ios_base::sync_with_stdio(false); char buffer[1024]; cout.rdbuf()->pubsetbuf(buffer, 1024); for(int i = 1; i < 200000; i++) { cout << "Hello world!\n"; count++; } cout<<"Count:%d\n"<<count; return 0; } </code></pre> <p>This makes me think about Java. What's the advantages of a using BufferReader to read in file?</p>

<p>For the stand of file operations, writing to memory (RAM) is always faster than writing to the file on the disk directly. </p> <p>For illustration, let's define:</p> <ul> <li>each write IO operation to a file on the disk costs 1 ms</li> <li>each write IO operation to a file on the disk over a network costs 5 ms</li> <li>each write IO operation to the memory costs 0.5 ms</li> </ul> <p>Let's say we have to write some data to a file 100 times.</p> <h3>Case 1: Directly Writing to File On Disk</h3> <pre class="prettyprint"><code>100 times x 1 ms = 100 ms </code></pre> <h3>Case 2: Directly Writing to File On Disk Over Network</h3> <pre class="prettyprint"><code>100 times x 5 ms = 500 ms </code></pre> <h3>Case 3: Buffering in Memory before Writing to File on Disk</h3> <pre class="prettyprint"><code>(100 times x 0.5 ms) + 1 ms = 51 ms </code></pre> <h3>Case 4: Buffering in Memory before Writing to File on Disk Over Network</h3> <pre class="prettyprint"><code>(100 times x 0.5 ms) + 5 ms = 55 ms </code></pre> <h3>Conclusion</h3> <p>Buffering in memory is always faster than direct operation. However if your system is low on memory and has to swap with page file, it'll be slow again. Thus you have to balance your IO operations between memory and disk/network.</p>

<p>The main issue with writing to the disk is that the time taken to write is not a linear function of the number bytes, but an affine one with a huge constant.</p> <p>In computing terms, it means that, for IO, you have a good throughput (less than memory, but quite good still), however you have poor latency (a tad better than network normally).</p> <p>If you look at evaluation articles of HDD or SSD, you'll notice that the read/write tests are separated in two categories:</p> <ul> <li>throughput in random reads</li> <li>throughput in contiguous reads</li> </ul> <p>The latter is normally significantly greater than the former.</p> <p>Normally, the OS and the IO library should abstract this for you, but as you noticed, if your routine is IO intensive, you might gain by increasing the buffer size. This is normal, the library is generally tailored for all kinds of uses and thus offers a good middle-ground for average applications. If your application is not "average", then it might not perform as fast as it could.</p>

<p>What compiler/platform are you using? I see no significant difference here (RedHat, gcc 4.1.2); both programs take 5-6 seconds to finish (but "user" time is about 150 ms). If I redirect output to a file (through the shell), total time is about 300 ms (so most of the 6 seconds is spent waiting for my console to catch up to the program).</p> <p>In other words, output should be buffered by default, so I'm curious why you're seeing such a huge speedup.</p> <p>3 tangentially-related notes:</p> <ol> <li>Your program has an off-by-one error in that you only print 199999 times instead of the stated 200000 (either start with <code>i = 0</code> or end with <code>i <= 200000</code>)</li> <li>You're mixing <code>printf</code> syntax with <code>cout</code> syntax when outputting count...the fix for that is obvious enough.</li> <li>Disabling <code>sync_with_stdio</code> produces a small speedup (about 5%) for me when outputting to console, but the impact is negligible when redirecting to file. This is a micro-optimization which you probably wouldn't need in most cases (IMHO).</li> </ol>

Why is buffering in C++ important?

Tags:

c++

buffer

I tried to print Hello World 200,000 times and it took me forever, so I have to stop. But right after I add a char array to act as a buffer, it took less than 10 seconds. Why?

Before adding a buffer:

#include <iostream> 
using namespace std;

int main() {
        int count = 0;
        std::ios_base::sync_with_stdio(false);
        for(int i = 1; i < 200000; i++)
        {       
                cout << "Hello world!\n";
                count++;
        }
                cout<<"Count:%d\n"<<count;
return 0;
}

And this is after adding a buffer:

#include <iostream> 
using namespace std;

int main() {
        int count = 0;
        std::ios_base::sync_with_stdio(false);
        char buffer[1024];
        cout.rdbuf()->pubsetbuf(buffer, 1024);
        for(int i = 1; i < 200000; i++)
        {       
                cout << "Hello world!\n";
                count++;
        }
                cout<<"Count:%d\n"<<count;
return 0;
}

This makes me think about Java. What's the advantages of a using BufferReader to read in file?

954

asked Feb 25 '11 02:02

Amumu

4 Answers

For the stand of file operations, writing to memory (RAM) is always faster than writing to the file on the disk directly.

For illustration, let's define:

each write IO operation to a file on the disk costs 1 ms
each write IO operation to a file on the disk over a network costs 5 ms
each write IO operation to the memory costs 0.5 ms

Let's say we have to write some data to a file 100 times.

Case 1: Directly Writing to File On Disk

100 times x 1 ms = 100 ms

Case 2: Directly Writing to File On Disk Over Network

100 times x 5 ms = 500 ms

Case 3: Buffering in Memory before Writing to File on Disk

(100 times x 0.5 ms) + 1 ms = 51 ms

Case 4: Buffering in Memory before Writing to File on Disk Over Network

(100 times x 0.5 ms) + 5 ms = 55 ms

Conclusion

Buffering in memory is always faster than direct operation. However if your system is low on memory and has to swap with page file, it'll be slow again. Thus you have to balance your IO operations between memory and disk/network.

answered Sep 18 '22 21:09

mauris

The main issue with writing to the disk is that the time taken to write is not a linear function of the number bytes, but an affine one with a huge constant.

In computing terms, it means that, for IO, you have a good throughput (less than memory, but quite good still), however you have poor latency (a tad better than network normally).

If you look at evaluation articles of HDD or SSD, you'll notice that the read/write tests are separated in two categories:

throughput in random reads
throughput in contiguous reads

The latter is normally significantly greater than the former.

Normally, the OS and the IO library should abstract this for you, but as you noticed, if your routine is IO intensive, you might gain by increasing the buffer size. This is normal, the library is generally tailored for all kinds of uses and thus offers a good middle-ground for average applications. If your application is not "average", then it might not perform as fast as it could.

answered Sep 18 '22 21:09

Matthieu M.

What compiler/platform are you using? I see no significant difference here (RedHat, gcc 4.1.2); both programs take 5-6 seconds to finish (but "user" time is about 150 ms). If I redirect output to a file (through the shell), total time is about 300 ms (so most of the 6 seconds is spent waiting for my console to catch up to the program).

In other words, output should be buffered by default, so I'm curious why you're seeing such a huge speedup.

3 tangentially-related notes:

Your program has an off-by-one error in that you only print 199999 times instead of the stated 200000 (either start with i = 0 or end with i <= 200000)
You're mixing printf syntax with cout syntax when outputting count...the fix for that is obvious enough.
Disabling sync_with_stdio produces a small speedup (about 5%) for me when outputting to console, but the impact is negligible when redirecting to file. This is a micro-optimization which you probably wouldn't need in most cases (IMHO).

answered Sep 20 '22 21:09

Sumudu Fernando

The cout function contains a lot of hidden and complex logic going all the way down the the kernel so you can write your text to the screen, when you use a buffer in that way your essentially do a batch request instead of repeating the complex I/O calls.

answered Sep 18 '22 21:09

Istinra

Related questions
                            
                                QT5 Align OSX QTabWidget left
                            
                                Are C++ implementations allowed to assume any rvalue reference function parameter is unique?
                            
                                "extra qualification" errors. How warranted by the Standard?
                            
                                Invalid operands of type 'double' and 'int' to binary 'operator%'
                            
                                Defining templated constant variables in cuda
                            
                                conversion operator with template functions
                            
                                g++ link problems: In function `_start': (.text+0x20): undefined reference to `main'
                            
                                Is std::atomic_flag static initialization thread safe in Visual Studio 2012?
                            
                                How can one modify an ItemDefinitionGroup from an MSBuild target?
                            
                                qmake: How to link a library twice?
                            
                                gcc with -isysroot creates include path that starts with equal sign "=" and compile fails
                            
                                Query on rand() function in C++
                            
                                A simple while-loop in GCC inline assembly
                            
                                C++ returning HashMap<string, boolean> object to Java
                            
                                what compression algorithm to use for highly redundant data
                            
                                C++11 threads in class
                            
                                Selecting shortest distance between pairs of elements
                            
                                operator char* in STL string class
                            
                                Functional data structures in C++
                            
                                How to break out of a function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is buffering in C++ important?

Tags:

c++

buffer

Amumu

People also ask

4 Answers

Case 1: Directly Writing to File On Disk

Case 2: Directly Writing to File On Disk Over Network

Case 3: Buffering in Memory before Writing to File on Disk

Case 4: Buffering in Memory before Writing to File on Disk Over Network

Conclusion

mauris

Matthieu M.

Sumudu Fernando

Istinra

Recent Activity

Donate For Us