Here's a bit of code that is a considerable bottleneck after doing some measuring:
//-----------------------------------------------------------------------------
// Construct dictionary hash set from dictionary file
//-----------------------------------------------------------------------------
void constructDictionary(unordered_set<string> &dict)
{
ifstream wordListFile;
wordListFile.open("dictionary.txt");
std::string word;
while( wordListFile >> word )
{
if( !word.empty() )
{
dict.insert(word);
}
}
wordListFile.close();
}
I'm reading in ~200,000 words and this takes about 240 ms on my machine. Is the use of ifstream
here efficient? Can I do better? I'm reading about mmap()
implementations but I'm not understanding them 100%. The input file is simply text strings with *nix line terminations.
EDIT: Follow-up question for the alternatives being suggested: Would any alternative (minus increasing the stream buffer sizes) imply that I write a parser that examines each character for new-lines? I kind of like the simple syntax of streams, but I can re-write something more nitty-gritty if I have to for speed. Reading the entire file in to memory is a viable option, it's only about 2mb.
EDIT #2: I've found that the slow down for me was due to the set insert, but for those who are still interested in speeding up line by line file IO, please read the answers here AND check out Matthieu M.'s continuation on the topic.
Quick profiling on my system (linux-2.6.37, gcc-4.5.2, compiled with -O3) shows that I/O is not the bottleneck. Whether using fscanf
into a char array followed by dict.insert() or operator>>
as in your exact code, it takes about the same time (155 - 160 ms to read a 240k word file).
Replacing gcc's std::unordered_set
with std::vector<std::string>
in your code drops the execution time to 45 ms (fscanf
) - 55 ms (operator>>
) for me. Try to profile IO and set insertion separately.
You could get better performance, normally, by increasing the buffer size.
Right after building the ifstream
, you can set its internal buffer using:
char LocalBuffer[4096]; // buffer
std::ifstream wordListFile("dictionary.txt");
wordListFile.rdbuf()->pubsetbuf(LocalBuffer, 4096);
Note: rdbuf
's result is guaranteed no to be null if the construction of ifstream
succeeded
Depending on the memory available, your are strongly encouraged to grow the buffer if possible in order to limit interaction with the HDD and the number of system calls.
I've performed some simple measurements using a little benchmark of my own, you can find the code below (and I am interested in critics):
gcc 3.4.2 on SLES 10 (sp 3)
C : 9.52725e+06
C++: 1.11238e+07
difference: 1.59655e+06
Which gives a slowdown of a whooping 17%.
This takes into account:
locale
So, we can argue that streams are slow... but please, don't throw your random piece of code and complains it's slow, optimization is hard work.
Corresponding code, where benchmark
is a little utility of my own which measure the time of a repeated execution (here launched for 50 iterations) using gettimeofday
.
#include <fstream>
#include <iostream>
#include <iomanip>
#include <cmath>
#include <cstdio>
#include "benchmark.h"
struct CRead
{
CRead(char const* filename): _filename(filename) {}
void operator()()
{
FILE* file = fopen(_filename, "r");
int count = 0;
while ( fscanf(file,"%s", _buffer) == 1 ) { ++count; }
fclose(file);
}
char const* _filename;
char _buffer[1024];
};
struct CppRead
{
CppRead(char const* filename): _filename(filename), _buffer() {}
enum { BufferSize = 16184 };
void operator()()
{
std::ifstream file(_filename);
file.rdbuf()->pubsetbuf(_buffer, BufferSize);
int count = 0;
std::string s;
while ( file >> s ) { ++count; }
}
char const* _filename;
char _buffer[BufferSize];
};
int main(int argc, char* argv[])
{
size_t iterations = 1;
if (argc > 1) { iterations = atoi(argv[1]); }
char const* filename = "largefile.txt";
CRead cread(filename);
CppRead cppread(filename);
double ctime = benchmark(cread, iterations);
double cpptime = benchmark(cppread, iterations);
std::cout << "C : " << ctime << "\n"
"C++: " << cpptime << "\n";
return 0;
}
Reading the whole file in one go into memory and then operating on it in would probably be faster as it avoids repeatedly going back to the disk to read another chunk.
Is 0.25s actually a problem? If you're not intending on loading much larger files is there any need to make it faster if it makes it less readable?
The C++ and C libraries read stuff off the disk equally fast and are already buffered to compensate for the disk I/O lag. You are not going to make it faster by adding more buffering.
The biggest difference is that C++ streams does a load of manipulations based on the locale. Character conversions/Punctuational etc/etc.
As a result the C libraries will be faster.
For some reason the linked question was deleted. So I am moving the relevant information here. The linked question was about hidden features in C++.
Though not techncially part of the STL.
The streams library is part of the standard C++ libs.
For streams:
Locales.
Very few people actually bother to learn how to correctly set and/or manipulate the locale of a stream.
The second coolest thing is the iterator templates.
Most specifically for me is the stream iterators, which basically turn the streams into very basic containers that can then be used in conjunction with the standard algorithms.
Examples:
etc.
Examples:
My system (3.2.0-52-generic, g++-4.7 (Ubuntu/Linaro 4.7.3-2ubuntu1~12.04) 4.7.3, compiled with -O2 if not specified, CPU: i3-2125)
In my test cases I used 295068 words dictionary (so, there are 100k more words than in yours): http://dl.dropboxusercontent.com/u/4076606/words.txt
From time complexity point of view:
Practical tips:
Notice: I didn't flush my OS cache & HDD cache. The last one I can't control, but first one can be controlled with:
sync; sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
Also I didn't omit those measurements that included a lot of context-switches and so on. So, there is space to do better measurements.
14–16 ms to read from file & insert data to a 2D char array (read & insert) n times
65-75 ms to search with binary search all the words (search n times):
Total=79-91 ms
61-78 ms to read from file & insert data to a unordered_set char array (read & insert) n times
7-9 ms to search by hash n times
Total=68-87 ms
If you search more times than you insert choose hash table (unordered_set) otherwise binary search (with simple array).
Compiled with -O2: 157-182 ms
Compiled with -O0 (if you omit -O flag, "-O" level by default is also 0): 223-248 ms
So, compiler options also matters, in this case it means 66 ms speed boost. You didn't specified any of them. So, my best guess is you didn't used it. As I try to answer to your main question.
Compiled with -O2: 142-170 ms. ~ 12-15 ms speed boost compared with your original code.
Compiled with -O0 (if you omit -O flag, "-O" level by default is also 0): 213-235 ms. ~ 10-13 ms speed boost compared with your original code.
Compiled with -O2: 99-121-[137] ms. ~ 33-43-[49] ms speed boost compared with your original code.
Implement your own hash function for your specific data input. Use char array instead of STL string. After you did it, only then write code with direct OS I/O. As you noticed (from my measurements also can be seen) that data structure is the bottleneck. If the media is very slow, but CPU is very fast, compress the file uncompress it in your program.
My code is not perfect but still it is better than anything can be seen above: http://pastebin.com/gQdkmqt8 (hash function is from the web, can be also done better)
Could you provide more details about for what system (one or range) do you plan to optimize?
Info of time complexities: Should be links... But I don't have so much reputation as I'm beginner in stackoverflow.
Is my answer still relevant to anything? Please, add a comment or vote as there is no PM as I see.
A proper implementation of the IO library would cache the data for you, avoiding excessive disk accesses and system calls. I recommend that you use a system-call level tool (e.g. strace if you're under Linux) to check what actually happens with your IO.
Obviously, dict.insert(xxx)
could also be a nuisance if it doesn't allow O(1) insertion.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With