running speed of permutation function using different methods results in unexpected results

Question

I have implemented a isPermutation function which given two string will return true if the two are permutation of each other, otherwise it will return false.

One uses c++ sort algorithm twice, while the other uses an array of ints to keep track of string count.

I ran the code several times and every time the sorting method is faster. Is my array implementation wrong?

Here is the output:

1
0
1
Time: 0.088 ms
1
0
1
Time: 0.014 ms

And the code:

#include <iostream> // cout
#include <string>   // string
#include <cstring> // memset
#include <algorithm> // sort
#include <ctime> // clock_t

using namespace std;

#define MAX_CHAR 255


void PrintTimeDiff(clock_t start, clock_t end) {
    std::cout << "Time: " << (end - start) / (double)(CLOCKS_PER_SEC / 1000) << " ms" << std::endl;
}


// using array to keep a count of used chars
bool isPermutation(string inputa, string inputb) {
    int allChars[MAX_CHAR];
    memset(allChars, 0, sizeof(int) * MAX_CHAR);

    for(int i=0; i < inputa.size(); i++) {
        allChars[(int)inputa[i]]++;
    }

    for (int i=0; i < inputb.size(); i++) {
        allChars[(int)inputb[i]]--;
        if(allChars[(int)inputb[i]] < 0) {
            return false;
        }   
    }

    return true;
}


// using sorting anc comparing
bool isPermutation_sort(string inputa, string inputb) {

    std::sort(inputa.begin(), inputa.end());
    std::sort(inputb.begin(), inputb.end());

    if(inputa == inputb) return true;
    return false;
}



int main(int argc, char* argv[]) {

    clock_t  start = clock();
    cout << isPermutation("god", "dog") << endl;
    cout << isPermutation("thisisaratherlongerinput","thisisarathershorterinput") << endl;
    cout << isPermutation("armen", "ramen") << endl;
    PrintTimeDiff(start, clock());


    start = clock();
    cout << isPermutation_sort("god", "dog") << endl;
    cout << isPermutation_sort("thisisaratherlongerinput","thisisarathershorterinput") << endl;
    cout << isPermutation_sort("armen", "ramen") << endl;
    PrintTimeDiff(start, clock());

    return 0;
}

Hal · Accepted Answer

To benchmark this you have to eliminate all the noise you can. The easiest way to do this is to wrap it in a loop that repeats the call to each 1000 times or so, then only spit out the value every 10 iterations. This way they each have a similar caching profile. Throw away values that are bogus due (eg blowouts due to context switches by the OS).

I got your method marginally faster by doing this. An excerpt.

method 1 array Time: 0.768 us
method 2 sort Time: 0.840333 us

method 1 array Time: 0.621333 us
method 2 sort Time: 0.774 us

method 1 array Time: 0.769 us
method 2 sort Time: 0.856333 us

method 1 array Time: 0.766 us
method 2 sort Time: 0.850333 us

method 1 array Time: 0.802667 us
method 2 sort Time: 0.89 us

method 1 array Time: 0.778 us
method 2 sort Time: 0.841333 us

I used rdtsc which works better for me on this system. 3000 cycles per microsecond is close enough for this, but please do make it more accurate if you care about precision of the readings.

#if defined(__x86_64__)
static uint64_t rdtsc()
{
    uint64_t    hi, lo;

    __asm__ __volatile__ (
                            "xor %%eax, %%eax
"
                            "cpuid
"
                            "rdtsc
"
                            : "=a"(lo), "=d"(hi)
                            :: "ebx", "ecx");

    return (hi << 32)|lo;
}
#else
#error wrong architecture - implement me
#endif

void PrintTimeDiff(uint64_t start, uint64_t end) {
    std::cout << "Time: " << (end - start)/double(3000)  << " us" << std::endl;
}

manlio · Answer

you cannot check performance differences between implementations putting in the mix calls to std::cout. isPermutation and isPermutation_sort are some order of magnitude faster than a call to std::cout (and, anyway, prefer over std::endl).
for testing you have to activate compiler optimizations. Doing so the compiler will apply the loop-invariant code motion optimization and you'll probably get the same results for both implementations.

A more effective way of testing is:

int main()
{
  const std::vector<std::string> bag
  {
    "god", "dog", "thisisaratherlongerinput", "thisisarathershorterinput",
    "armen", "ramen"
  };

  static std::mt19937 engine;
  std::uniform_int_distribution<std::size_t> rand(0, bag.size() - 1);

  const unsigned stop = 1000000;

  unsigned counter = 0;
  std::clock_t start = std::clock();
  for (unsigned i(0); i < stop; ++i)
    counter += isPermutation(bag[rand(engine)], bag[rand(engine)]);

  std::cout << counter << '
';
  PrintTimeDiff(start, clock());

  counter = 0;
  start = std::clock();
  for (unsigned i(0); i < stop; ++i)
    counter += isPermutation_sort(bag[rand(engine)], bag[rand(engine)]);

  std::cout << counter << '
';
  PrintTimeDiff(start, clock());

  return 0;
}

I have 2.4s for isPermutations_sort vs 2s for isPermutation (somewhat similar to Hal's results). Same with g++ and clang++.

Printing the value of counter has the double benefit of:

triggering the as-if rule (the compiler cannot remove the for loops);
allowing a first check of your implementations (the two values cannot be too distant).

There're some things you have to change in your implementation of isPermutation:

pass arguments as const references
```
bool isPermutation(const std::string &inputa, const std::string &inputb)
```
just this change brings the time down to 0.8s (of course you cannot do the same with isPermutation_sort).
you can use std::array and std::fill instead of memset (this is C++ :-)
avoid premature pessimization and prefer preincrement. Only use postincrement if you're going to use the original value
do not mix signed and unsigned value in the for loops (inputa.size() and i). i should be declared as std::size_t
even better, use the range based for loop.

So something like:

bool isPermutation(const std::string &inputa, const std::string &inputb)
{
  std::array<int, MAX_CHAR> allChars;
  allChars.fill(0);

  for (auto c : inputa)
    ++allChars[(unsigned char)c];

  for (auto c : inputb)
  {
    --allChars[(unsigned char)c];
    if (allChars[(unsigned char)c] < 0)
      return false;
  }

  return true;
}

Anyway both isPermutation and isPermutation_sort should have this preliminary check:

  if (inputa.length() != inputb.length())
    return false;

Now we are at 0.55s for isPermutation vs 1.1s for isPermutation_sort.

Last but not least consider std::is_permutation:

for (unsigned i(0); i < stop; ++i)
{
  const std::string &s1(bag[rand(engine)]), &s2(bag[rand(engine)]);

  counter += std::is_permutation(s1.begin(), s1.end(), s2.begin());
}

(0.6s)

EDIT

As observed in BeyelerStudios' comment Mersenne-Twister is too much in this case.

You can change the engine to a simpler one.:

static std::linear_congruential_engine<std::uint_fast32_t, 48271, 0, 2147483647> engine;

This further lowers the timings. Luckily the relative speeds remain the same.

Just to be sure I've also checked with a non random access scheme obtaining the same relative results.

running speed of permutation function using different methods results in unexpected results

Tags:

c++

performance

optimization

performance-testing

permutation

ArmenB

2 Answers

Hal

manlio

Recent Activity

Donate For Us

running speed of permutation function using different methods results in unexpected results

Tags:

c++

performance

optimization

performance-testing

permutation

ArmenB

2 Answers

Hal

manlio

Related questions

Recent Activity

Donate For Us