I have a binary number (52 bits) represented as a string "01100011...."
What would be the quickest way to count the number of 1's?
"01100011....".count("1")
obviously works but is quite time consuming if this operation needs to be done thousands of times.
ok, some more info. I am trying to create bit vectors for words as follows
def bit_vec(str)
alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
bv = ""
alphabet.each_char do |a|
if str.include?(a)
bv += "1"
else
bv += "0"
end
end
bv
end
The bit_vec method gets called about 170K times. I store the bit vectors in a hash and use them to find similar words for a given word by XOR'ing the bit vectors and counting the number of 1's (more 1's == less similarity). If the count method does not use String#scan the what else could use it?
I know Ruby is slower than say C or Java. I am just looking to improve the algorithm the best I can. I am not looking for raw speed.
Maybe the include? method is the bottleneck?
∴ The total number of 1's is 8.
To convert integer to binary, start with the integer in question and divide it by 2 keeping notice of the quotient and the remainder. Continue dividing the quotient by 2 until you get a quotient of zero. Then just write out the remainders in the reverse order.
Note that the problem of counting 1-bits is referred to as a "population count".
At least in Ruby, stick with handling these as a string via the count
method unless you have a compelling reason to use integers.
count
:
Benchmark: 78.60s for 10,000,000 iterations (127,225.63 iterations per second)
For integer math,
If you don't care about values above 2**32
,
def popcount(x)
m1 = 0x55555555
m2 = 0x33333333
m4 = 0x0f0f0f0f
x -= (x >> 1) & m1
x = (x & m2) + ((x >> 2) & m2)
x = (x + (x >> 4)) & m4
x += x >> 8
return (x + (x >> 16)) & 0x3f
end
Benchmark: 105.73s for 10,000,000 iterations (94,579.03 iterations per second)
If you do care about values above 2**32
,
def popcount(x)
b = 0
while x > 0
x &= x - 1
b += 1
end
return b
end
Benchmark: 365.59s for 10,000,000 iterations (27,353.27 iterations per second)
Addenda:
Your code:
Benchmark: 78.25s for 1,000,000 iterations (12,779.56 iterations per second)
This code:
def bit_vec(str)
# alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
bv = "0" * 52
str.each_char do |c|
ord = c.ord
next unless (ord >= 65 && ord <= 90) || (ord >= 97 && ord <= 122)
index = ord - 65
index -= 6 if index > 25
bv[index] = "1"
break if bv == "1111111111111111111111111111111111111111111111111111"
end
bv
end
Note: You said that you were dealing with a 52-bit number, so I assumed that you cared about both upper and lower case letters (26 + 26 = 52). I opted to check for uppercase first because that's how they appear in pretty much every character set ever, making the calculations a little easier.
Benchmark: 24.86s for 1,000,000 iterations (40,231.60 iterations per second)
3.14x speed-up.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With