I've profiled a computationally-heavy C++ program on Linux using cachegrind. Surprisingly, it turns out the bottleneck of my program is not in any sorting or computational method ... it's in reading the input.
Here is a screenshot of cachegrind, in case I'm mis-interpreting the profiler results (see scanf()
):
I hope I'm right in saying that scanf()
is taking 80.92% of my running time.
I read input using cin >> int_variable_here
, like so:
std::ios_base::sync_with_stdio (false); // Supposedly makes I/O faster
cin >> NumberOfCities;
cin >> NumberOfOldRoads;
Roads = new Road[NumberOfOldRoads];
for (int i = 0; i < NumberOfOldRoads; i++)
{
int cityA, cityB, length;
cin >> cityA;
//scanf("%d", &cityA); // scanf() and cin are both too slow
cin >> cityB;
//scanf("%d", &cityB);
cin >> length;
//scanf("%d", &length);
Roads[i] = Road(cityA, cityB, length);
}
If you don't spot any issues with this input reading code, could you please recommend a faster way to read input? I'm thinking of trying getline()
(working on it while I wait for responses). My guess is getline() may run faster because it has to do less conversion and it parses the stream a less total number of times (just my guess, though I'd have to parse the strings as integers eventually too).
What I mean by "too slow" is, this is part of a larger homework assignment that gets timed out after a certain period of time (I believe it is 90 seconds). I'm pretty confident the bottleneck is here because I purposely commented out a major portion of the rest of my program and it still timed out. I don't know what test cases the instructor runs through my program, but it must be a huge input file. So, what can I use to read input fastest?
The input format is strict: 3 integers separated by one space for each line, for many lines:
Sample Input:
7 8 3
7 9 2
8 9 1
0 1 28
0 5 10
1 2 16
I need to make a Road
out of the integers in each line.
Also please not that input is redirected to my program to the standard input (myprogram < whatever_test_case.txt
). I'm not reading a specific file. I just read from cin
.
Using Slava's method:
Input reading seems to be taking less time, but its still timing out (may not be due to input reading anymore). Slava's method is implemented in the Road() ctor
(2 down from main
). So now it takes 22% of the time as opposed to 80%. I'm thinking of optimizing SortRoadsComparator()
as it's called 26,000,000 times.
Comparator Code:
// The complexity is sort of required for the whole min() max(), based off assignment instructions
bool SortRoadsComparator(const Road& a, const Road& b)
{
if (a.Length > b.Length)
return false;
else if (b.Length > a.Length)
return true;
else
{
// Non-determinism case
return ( (min(a.CityA, a.CityB) < min(b.CityA, b.CityB)) ||
(
(min(a.CityA, a.CityB) == min(b.CityA, b.CityB)) && max(a.CityA, a.CityB) < max(b.CityA, b.CityB)
)
);
}
}
Using enhzflep's method
I'm going to consider this problem solved because the bottleneck is no longer in reading input. Slava's method was the fastest for me.
Streams pretty well know to be very slow. It is not a big surprise though - they need to handle localizations, conditions etc. One possible solution would be to read file line by line by std::getline( std:::cin, str ) and convert string to numbers by something like this:
std::vector<int> getNumbers( const std::string &str )
{
std::vector<int> res;
int value = 0;
bool gotValue = false;
for( int i = 0; i < str.length(); ++i ) {
if( str[i] == ' ' ) {
if( gotValue ) res.push_back( value );
value = 0;
gotValue = false;
continue;
}
value = value * 10 + str[i] - '0';
gotValue = true;
}
if( gotValue ) res.push_back( value );
return res;
}
I did not test this code, wrote it to show the idea. I assume you do not expect to get anything in input but spaces and numbers, so it does not validate the input.
To optimize sorting first of all you should check if you really need to sort whole sequence. For comparator I would write methods getMin() getMax() and store that values in object (not to calculate them all the time):
bool SortRoadsComparator(const Road& a, const Road& b)
{
if( a.Length != b.Length ) return a.Length < b.length;
if( a.getMin() != b.getMin() ) return a.getMin() < b.getMin();
return a.getMax() < b.getMax();
}
if I understood how you current comparator works correctly.
As Slava says, streams (i.e cin) are absolute pigs in terms of performance (and executable file size)
Consider the following two approaches:
start = clock();
std::ios_base::sync_with_stdio (false); // Supposedly makes I/O faster
cin >> NumberOfCities >> NumberOfOldRoads;
Roads = new Road[NumberOfOldRoads];
for (int i = 0; i < NumberOfOldRoads; i++)
{
int cityA, cityB, length;
cin >> cityA >> cityB >> length;
Roads[i] = Road(cityA, cityB, length);
}
stop = clock();
printf ("time: %d\n", stop-start);
and
start = clock();
fp = stdin;
fscanf(fp, "%d\n%d\n", &NumberOfCities, &NumberOfOldRoads);
Roads = new Road[NumberOfOldRoads];
for (int i = 0; i < NumberOfOldRoads; i++)
{
int cityA, cityB, length;
fscanf(fp, "%d %d %d\n", &cityA, &cityB, &length);
Roads[i] = Road(cityA, cityB, length);
}
stop = clock();
printf ("time: %d\n", stop-start);
Running each way 5 times (with an input file of 1,000,000 entries + the first 2 'control' lines) gives us these results:
Using cin without the direction to not sync with stdio 8291, 8501, 8720, 8918, 7164 (avg 8318.3)
Using cin with the direction to not sync with stdio 4875, 4674, 4921, 4782, 5171 (avg 4884.6)
Using fscanf 1681, 1676, 1536, 1644, 1675 (avg 1642.4)
So, clearly, one can see that the sync_with_stdio(false) direction does help. One can also see that fscanf beats the pants off each approach with cin. In fact, the fscanf approach is nearly 3 times faster than the better of the cin approaches and a whopping 5 times faster than cin when not told to avoid syncing with stdio.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With