I've been working on a fairly large C++ project for a few weeks now. My original goal was to use this project to learn about C++11 and use only pure C++ code and avoid manual allocation and C constructs. However, I think this problem is going to force me to use C for a small function and I'd like to know why.
Basically I have a save function that will copy a somewhat large binary file to a separate location before I make changes to the data in it. The files themselves are CD images with a max size of around 700MB. Here is the original C++ code that I used:
std::ios::sync_with_stdio(false);
std::ifstream in(infile, std::ios::binary);
std::ofstream out(outfile, std::ios::binary);
std::copy(std::istreambuf_iterator<char>(in), std::istreambuf_iterator<char>(), std::ostreambuf_iterator<char>(out));
out.close();
in.close();
This code when used with a 690MB file takes barely under 4 minutes to complete. I have ran it with multiple files and it's always the same result; nothing under 3 minutes. However, I also found the following way which ran a little bit faster, but still nowhere as fast as C:
std::ios::sync_with_stdio(false);
std::ifstream in(infile, std::ios::binary);
std::ofstream out(outfile, std::ios::binary);
out << in.rdbuf();
out.close();
in.close();
This one took 24 seconds, but it's still around 20 times slower than C.
After looking around I found someone needing to write an 80GB file and seeing that he could write at full speed using C. I decided to give it a try with this code:
FILE *in = fopen(infile, "rb");
FILE *out = fopen(outfile, "wb");
char buf[1024];
int read = 0;
// Read data in 1kb chunks and write to output file
while ((read = fread(buf, 1, 1024, in)) == 1024)
{
fwrite(buf, 1, 1024, out);
}
// If there is any data left over write it out
fwrite(buf, 1, read, out);
fclose(out);
fclose(in);
The results were pretty shocking. Here is one of the benchmarks I have after running it multiple times on many different files:
File Size: 565,371,408 bytes
C : 1.539s | 350.345 MB/s
C++: 24.754s | 21.7815 MB/s - out << in.rdbuf()
C++: 220.555s | 2.44465 MB/s - std::copy()
What is the cause of this vast difference? I know C++ won't match the performance of plain C, but 348MB/s difference is massive. Is there something I'm missing?
Edit:
I am compiling this using Visual Studio 2013 on a Windows 8.1 64-bit OS.
Edit 2:
After reading John Zwinck's answer I decided to just go the platform specific route. Since I still wanted to make my project cross-platform I threw together a quick example. I am really not sure if these work on the other systems besides Windows, but I can test Linux at a later date. I cannot test OSX, but I think copyfile looks like a simple function so I assume it's correct.
Keep in mind you need to do the same #ifdef logic for including platform specific headers.
void copy(std::string infile, std::string outfile)
{
#ifdef _WIN32 || _WIN64
// Windows
CopyFileA(infile.c_str(), outfile.c_str(), false);
#elif __APPLE__
// OSX
copyfile(infile.c_str(), outfile.c_str(), NULL, COPYFILE_DATA);
#elif __linux
// Linux
struct stat stat_buf;
int in_fd, out_fd;
offset_t offset = 0;
in_fd = open(infile.c_str(), O_RDONLY);
fstat(in_fd, &stat_buf);
out_fd = open(outfile.c_str(), O_WRONLY | O_CREAT, stat_buf.st_mode);
sendfile(out_fd, in_fd, &offset, stat_buf.st_size);
close(out_fd);
close(in_fd);
#endif
}
First, you should also benchmark against copying the same file using the CLI on the same machine.
Second, if you want maximum performance you need to use a platform-specific API. On Windows that is probably CopyFile/CopyFileEx, on Mac OS it's copyfile, and on Linux it's sendfile. Some of those (definitely sendfile) offer performance which cannot be achieved using the basic portable stuff in C or C++. Some of them (CopyFileEx and copyfile) offer extra features such as copying filesystem attributes and optional progress callbacks.
You can see some benchmarks showing how much faster sendfile can be here: Copy a file in a sane, safe and efficient way
Finally, it is sad but true that C++ iostreams are not as fast as C file I/O on many platforms. If you care a lot about performance, you may be better off using C functions. I've encountered this when doing programming contests where runtime speed matters: using scanf and printf instead of cin and cout makes a big difference on many systems.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With