Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby: count the number of 1's in a binary number

I have a binary number (52 bits) represented as a string "01100011...."

What would be the quickest way to count the number of 1's?

"01100011....".count("1") 

obviously works but is quite time consuming if this operation needs to be done thousands of times.

ok, some more info. I am trying to create bit vectors for words as follows

def bit_vec(str)
    alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
    bv = ""
    alphabet.each_char do |a|
        if str.include?(a)
            bv += "1"
        else
            bv += "0"
        end
    end
        bv
end

The bit_vec method gets called about 170K times. I store the bit vectors in a hash and use them to find similar words for a given word by XOR'ing the bit vectors and counting the number of 1's (more 1's == less similarity). If the count method does not use String#scan the what else could use it?

I know Ruby is slower than say C or Java. I am just looking to improve the algorithm the best I can. I am not looking for raw speed.

Maybe the include? method is the bottleneck?

like image 709
Maulin Avatar asked Oct 28 '09 19:10

Maulin


People also ask

How many 1s are there in the binary form of the result of expression?

∴ The total number of 1's is 8.

How do you find the representation of a binary number?

To convert integer to binary, start with the integer in question and divide it by 2 keeping notice of the quotient and the remainder. Continue dividing the quotient by 2 until you get a quotient of zero. Then just write out the remainders in the reverse order.


1 Answers

Note that the problem of counting 1-bits is referred to as a "population count".

At least in Ruby, stick with handling these as a string via the count method unless you have a compelling reason to use integers.

count:

Benchmark: 78.60s for 10,000,000 iterations (127,225.63 iterations per second)

For integer math,

If you don't care about values above 2**32,

def popcount(x)
  m1 = 0x55555555
  m2 = 0x33333333
  m4 = 0x0f0f0f0f
  x -= (x >> 1) & m1
  x = (x & m2) + ((x >> 2) & m2)
  x = (x + (x >> 4)) & m4
  x += x >> 8
  return (x + (x >> 16)) & 0x3f
end

Benchmark: 105.73s for 10,000,000 iterations (94,579.03 iterations per second)

If you do care about values above 2**32,

def popcount(x)
  b = 0
  while x > 0
    x &= x - 1
    b += 1
  end
  return b
end

Benchmark: 365.59s for 10,000,000 iterations (27,353.27 iterations per second)

Addenda:

Your code:

Benchmark: 78.25s for 1,000,000 iterations (12,779.56 iterations per second)

This code:

def bit_vec(str)
  # alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
  bv = "0" * 52
  str.each_char do |c|
    ord = c.ord
    next unless (ord >= 65 && ord <= 90) || (ord >= 97 && ord <= 122)
    index = ord - 65
    index -= 6 if index > 25
    bv[index] = "1"
    break if bv == "1111111111111111111111111111111111111111111111111111"
  end
  bv
end

Note: You said that you were dealing with a 52-bit number, so I assumed that you cared about both upper and lower case letters (26 + 26 = 52). I opted to check for uppercase first because that's how they appear in pretty much every character set ever, making the calculations a little easier.

Benchmark: 24.86s for 1,000,000 iterations (40,231.60 iterations per second)

3.14x speed-up.

like image 75
Bob Aman Avatar answered Sep 19 '22 06:09

Bob Aman