Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Faster form for hamming distance in c++ (potentially taking advantage of standard library)?

I have two int vectors like a[100], b[100].
The simple way to calculate their hamming distance is:

std::vector<int> a(100);
std::vector<int> b(100);

double dist = 0;    
for(int i = 0; i < 100; i++){
    if(a[i] != b[i])
        dist++;
}
dist /= a.size();

I would like to ask that is there a faster way to do this calculation in C++ or how to use STL to do the same job?

like image 476
sflee Avatar asked Jan 13 '14 13:01

sflee


People also ask

What is Hamming distance in C?

Hamming distance is a metric for comparing two binary data strings. While comparing two binary strings of equal length, Hamming distance is the number of bit positions in which the two bits are different.

What is Hamming distance explain with suitable example?

The Hamming distance involves counting up which set of corresponding digits or places are different, and which are the same. For example, take the text string “hello world” and contrast it with another text string, “herra poald.” There are five places along the corresponding strings where the letters are different.

What would be the Hamming distance for the given strings?

The Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In other words, it is the number of substitutions required to transform one string into another. Given two strings of equal length, compute the Hamming distance.

What does Hamming distance tell us?

The Hamming distance is a metric (in the mathematical sense) used in error correction theory to measure the distance between two codewords. In detail, the Hamming distance measures the number of different bits in two strings of the same length.


2 Answers

You asked for a faster way. This is a embarrassingly parallel problem, so, with C++ you can take advantage of that in two ways: thread parallelism, and vectorization through optimization.

//The following flags allow cpu specific vectorization optimizations on *my cpu*
//clang++ -march=corei7-avx hd.cpp -o hd -Ofast -pthread -std=c++1y
//g++ -march=corei7-avx hd.cpp -o hd -Ofast -pthread -std=c++1y

#include <vector>
#include <thread>
#include <future>
#include <numeric>

template<class T, class I1, class I2>
T hamming_distance(size_t size, I1 b1, I2 b2) {
    return std::inner_product(b1, b1 + size, b2, T{},
            std::plus<T>(), std::not_equal_to<T>());
}

template<class T, class I1, class I2>
T parallel_hamming_distance(size_t threads, size_t size, I1 b1, I2 b2) {
    if(size < 1000)
       return hamming_distance<T, I1, I2>(size, b1, b2);

    if(threads > size)
        threads = size;

    const size_t whole_part = size / threads;
    const size_t remainder = size - threads * whole_part;

    std::vector<std::future<T>> bag;
    bag.reserve(threads + (remainder > 0 ? 1 : 0));

    for(size_t i = 0; i < threads; ++i)
        bag.emplace_back(std::async(std::launch::async,
                            hamming_distance<T, I1, I2>,
                            whole_part,
                            b1 + i * whole_part,
                            b2 + i * whole_part));
    if(remainder > 0)
        bag.emplace_back(std::async(std::launch::async,
                            hamming_distance<T, I1, I2>,
                            remainder,
                            b1 + threads * whole_part,
                            b2 + threads * whole_part));

    T hamming_distance = 0;
    for(auto &f : bag) hamming_distance += f.get();
    return hamming_distance;
}

#include <ratio>
#include <random>
#include <chrono>
#include <iostream>
#include <cinttypes>

int main() {
    using namespace std;
    using namespace chrono;

    random_device rd;
    mt19937 gen(rd());
    uniform_int_distribution<> random_0_9(0, 9);

    const auto size = 100 * mega::num;
    vector<int32_t> v1(size);
    vector<int32_t> v2(size);

    for(auto &x : v1) x = random_0_9(gen);
    for(auto &x : v2) x = random_0_9(gen);

    cout << "naive hamming distance: ";
    const auto naive_start = high_resolution_clock::now();
    cout << hamming_distance<int32_t>(v1.size(), begin(v1), begin(v2)) << endl;
    const auto naive_elapsed = high_resolution_clock::now() - naive_start;

    const auto n = thread::hardware_concurrency();

    cout << "parallel hamming distance: ";
    const auto parallel_start = high_resolution_clock::now();
    cout << parallel_hamming_distance<int32_t>(
                                                    n,
                                                    v1.size(),
                                                    begin(v1),
                                                    begin(v2)
                                              )
         << endl;
    const auto parallel_elapsed = high_resolution_clock::now() - parallel_start;

    auto count_microseconds =
        [](const high_resolution_clock::duration &elapsed) {
            return duration_cast<microseconds>(elapsed).count();
        };

    cout << "naive delay:    " << count_microseconds(naive_elapsed) << endl;
    cout << "parallel delay: " << count_microseconds(parallel_elapsed) << endl;
}

notice that I'm not taking the division over the vector size

Results for my machine (which shows it didn't get much for a machine which only 2 physical cores...):

$ clang++ -march=corei7-avx hd.cpp -o hd -Ofast -pthread -std=c++1y -stdlib=libc++ -lcxxrt -ldl
$ ./hd
naive hamming distance: 89995190
parallel hamming distance: 89995190
naive delay:    52758
parallel delay: 47227

$ clang++ hd.cpp -o hd -O3 -pthread -std=c++1y -stdlib=libc++ -lcxxrt -ldl
$ ./hd
naive hamming distance: 90001042
parallel hamming distance: 90001042
naive delay:    53851
parallel delay: 46887

$ g++ -march=corei7-avx hd.cpp -o hd -Ofast -pthread -std=c++1y -Wl,--no-as-needed
$ ./hd
naive hamming distance: 90001825
parallel hamming distance: 90001825
naive delay:    55229
parallel delay: 49355

$ g++ hd.cpp -o hd -O3 -pthread -std=c++1y -Wl,--no-as-needed
$ ./hd
naive hamming distance: 89996171
parallel hamming distance: 89996171
naive delay:    54189
parallel delay: 44928

Also I see no effect from auto vectorization, may have to check the assembly...

For a sample about vectorization and compiler options check this blog post of mine.

like image 175
pepper_chico Avatar answered Sep 27 '22 21:09

pepper_chico


There is a very simple way to optimize this.

int disti = 0;    
for(int i = 0; i < n; i++) disti += (a[i] != b[i]);
double dist = 1.0*disti/a.size();

This skips the branch and uses the virtue that a conditional test returns 1 or 0. Additionally, it is auto-vectorized in GCC (-ftree-vectorizer-verbose=1 to check) while the version in the question is not.

Edit:

I went ahead and tested this out with the function in the question which I called hamming_distance the simple fix I suggested which I call hamming_distance_fix and a version which uses the fix as well as OpenMP which I call hamming_distance_fix_omp. Here are the times

hamming_distance          1.71 seconds
hamming_distance_fix      0.38 seconds  //SIMD
hamming_distance_fix_omp  0.12 seconds  //SIMD + MIMD

Here is the code. I did not use much syntactic candy but it should be very easy to convert this to use STL and so forth...You can see the results here http://coliru.stacked-crooked.com/a/31293bc88cff4794

//g++-4.8 -std=c++11 -O3 -fopenmp -msse2 -Wall -pedantic -pthread main.cpp && ./a.out
#include <stdio.h>
#include <stdlib.h>
#include <omp.h>

double hamming_distance(int* a, int*b, int n) {
    double dist = 0;
    for(int i=0; i<n; i++) {
        if (a[i] != b[i]) dist++;
    }
    return dist/n;
}
double hamming_distance_fix(int* a, int* b, int n) {
    int disti = 0;
    for(int i=0; i<n; i++) {
       disti += (a[i] != b[i]);
    }
    return 1.0*disti/n;
}

double hamming_distance_fix_omp(int* a, int* b, int n) {
    int disti = 0;
    #pragma omp parallel for reduction(+:disti)
    for(int i=0; i<n; i++) {
       disti += (a[i] != b[i]);
    }
    return 1.0*disti/n;
}

int main() {
    const int n = 1<<16;
    const int repeat = 10000;
    int *a = new int[n];
    int *b = new int[n];
    for(int i=0; i<n; i++) 
    { 
        a[i] = rand()%10;
        b[i] = rand()%10;
    }

    double dtime, dist;
    dtime = omp_get_wtime();
    for(int i=0; i<repeat; i++) dist = hamming_distance(a,b,n);
    dtime = omp_get_wtime() - dtime;
    printf("dist %f, time (s) %f\n", dist, dtime);

    dtime = omp_get_wtime();
    for(int i=0; i<repeat; i++) dist = hamming_distance_fix(a,b,n);
    dtime = omp_get_wtime() - dtime;
    printf("dist %f, time (s) %f\n", dist, dtime);

    dtime = omp_get_wtime();
    for(int i=0; i<repeat; i++) dist = hamming_distance_fix_omp(a,b,n);
    dtime = omp_get_wtime() - dtime;
    printf("dist %f, time (s) %f\n", dist, dtime);  
}
like image 32
Z boson Avatar answered Sep 27 '22 22:09

Z boson