Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most efficient way to calculate hamming distance in ruby?

In ruby, what is the most efficient way to calculate the bit difference between two unsigned integers (e.g. the hamming distance)?

Eg, I have integer a = 2323409845 and b = 1782647144.

Their binary representations are:

a = 10001010011111000110101110110101
b = 01101010010000010000100101101000

The bit difference between the a & b is 17..

I can do a logical XOR on them, but that will give me a different integer != 17, I would then have to iterate through the binary representation of the result and tally the # of 1s.

What is the most efficient way to calculate the bit difference?

Now, does the answer change for calculating the bit difference of sequences of many ints? E.g. given 2 sequences of unsigned integers:

x = {2323409845,641760420,509499086....}
y = {uint,uint,uint...}

What is the most efficient way to calculate the bit difference between the two sequences?

Would you iterate through the sequence, or is there a faster way to calculate the difference across the entire sequence at once?

like image 731
ch3rryc0ke Avatar asked Jun 18 '11 09:06

ch3rryc0ke


People also ask

How do you calculate Hamming distance?

To calculate the Hamming distance, you simply count the number of bits where two same-length messages differ. An example of Hamming distance 1 is the distance between 1101 and 1001 . If you increase the distance to 2 , we can give as an example 1001 and 1010 .

How do you find the Hamming distance between two vectors?

The Hamming distance between two vectors is simply the sum of corresponding elements that differ between the vectors. The Hamming distance between the two vectors would be 2, since this is the total number of corresponding elements that have different values.

What is normalized Hamming distance?

Normalized Hamming distance is the ratio of the Hamming distance to the length of the string. A lower value of Normalized Hamming distance suggests the two strings are more similar. The Levenshtein distance can be computed to find the distance between the two strings and also find the number of edits.


2 Answers

You can make use of the optimized String functions in Ruby to do the bit counting, instead of pure arithmetic. It turns out to be about 6 times faster with some quick benchmarking.

def h2(a, b)
  (a^b).to_s(2).count("1")
end

h1 is the normal way to calculate, while h2 converts the xor into a string, and counts the number of "1"s

Benchmark:

ruby-1.9.2-p180:001:0>> def h1(a, b)
ruby-1.9.2-p180:002:1*> ret = 0
ruby-1.9.2-p180:003:1*> xor = a ^ b
ruby-1.9.2-p180:004:1*> until xor == 0
ruby-1.9.2-p180:005:2*> ret += 1
ruby-1.9.2-p180:006:2*> xor &= xor - 1
ruby-1.9.2-p180:007:2*> end
ruby-1.9.2-p180:008:1*> ret
ruby-1.9.2-p180:009:1*> end
# => nil
ruby-1.9.2-p180:010:0>> def h2(a, b)
ruby-1.9.2-p180:011:1*> (a^b).to_s(2).count("1")
ruby-1.9.2-p180:012:1*> end
# => nil
ruby-1.9.2-p180:013:0>> h1(2323409845, 1782647144)
# => 17
ruby-1.9.2-p180:014:0>> h2(2323409845, 1782647144)
# => 17
ruby-1.9.2-p180:015:0>> quickbench(10**5) { h1(2323409845, 1782647144) }
Rehearsal ------------------------------------
   2.060000   0.000000   2.060000 (  1.944690)
--------------------------- total: 2.060000sec

       user     system      total        real
   1.990000   0.000000   1.990000 (  1.958056)
# => nil
ruby-1.9.2-p180:016:0>> quickbench(10**5) { h2(2323409845, 1782647144) }
Rehearsal ------------------------------------
   0.340000   0.000000   0.340000 (  0.333673)
--------------------------- total: 0.340000sec

       user     system      total        real
   0.320000   0.000000   0.320000 (  0.326854)
# => nil
ruby-1.9.2-p180:017:0>> 
like image 120
Dogbert Avatar answered Sep 18 '22 09:09

Dogbert


Per the suggestion of mu is too short, I wrote a simple C extension to use __builtin_popcount , and using benchmark verified that it is at least 3X faster than ruby's optimized string functions..

I looked at the following two tutorials:

  • Extending Ruby With C
  • Ruby Extension in C in 5 min

In my program:

require './FastPopcount/fastpopcount.so'
include FastPopcount

def hamming(a,b)
  popcount(a^b)
end

Then in the dir containing my program, I create a folder "PopCount" with the following files.

extconf.rb:

# Loads mkmf which is used to make makefiles for Ruby extensions
require 'mkmf'

# Give it a name
extension_name = 'fastpopcount'

# The destination
dir_config(extension_name)

# Do the work
create_makefile(extension_name)

popcount.c:

// Include the Ruby headers and goodies
#include "ruby.h"

// Defining a space for information and references about the module to be stored internally
VALUE FastPopcount = Qnil;

// Prototype for the initialization method - Ruby calls this, not you
void Init_fastpopcount();

// Prototype for our method 'popcount' - methods are prefixed by 'method_' here
VALUE method_popcount(int argc, VALUE *argv, VALUE self);

// The initialization method for this module
void Init_fastpopcount() {
    FastPopcount = rb_define_module("FastPopcount");
    rb_define_method(FastPopcount, "popcount", method_popcount, 1); 
}

// Our 'popcount' method.. it uses the builtin popcount
VALUE method_popcount(int argc, VALUE *argv, VALUE self) {
    return INT2NUM(__builtin_popcount(NUM2UINT(argv)));
}

Then in the popcount directory run:

ruby extconf.rb make

Then run the program, and there you have it....fastest way to do hamming distance in ruby.

like image 24
ch3rryc0ke Avatar answered Sep 21 '22 09:09

ch3rryc0ke