Building on How Do You Express Binary Literals in Python, I was thinking about sensible, intuitive ways to do that Programming 101 chestnut of displaying integers in base-2 form. This is the best I came up with, but I'd like to replace it with a better algorithm, or at least one that should have screaming-fast performance. <pre class="prettyprint"><code>def num_bin(N, places=8): def bit_at_p(N, p): ''' find the bit at place p for number n ''' two_p = 1 << p # 2 ^ p, using bitshift, will have exactly one # bit set, at place p x = N & two_p # binary composition, will be one where *both* numbers # have a 1 at that bit. this can only happen # at position p. will yield two_p if N has a 1 at # bit p return int(x > 0) bits = ( bit_at_p(N,x) for x in xrange(places)) return "".join( (str(x) for x in bits) ) # or, more consisely # return "".join([str(int((N & 1 << x)>0)) for x in xrange(places)]) </code></pre>

For best efficiency, you generally want to process more than a single bit at a time. You can use a simple method to get a fixed width binary representation. eg. <pre class="prettyprint"><code>def _bin(x, width): return ''.join(str((x>>i)&1) for i in xrange(width-1,-1,-1)) </code></pre> _bin(x, 8) will now give a zero padded representation of x's lower 8 bits. This can be used to build a lookup table, allowing your converter to process 8 bits at a time (or more if you want to devote the memory to it). <pre class="prettyprint"><code>_conv_table = [_bin(x,8) for x in range(256)] </code></pre> Then you can use this in your real function, stripping off leading zeroes when returning it. I've also added handling for signed numbers, as without it you will get an infinite loop (Negative integers conceptually have an infinite number of set sign bits.) <pre class="prettyprint"><code>def bin(x): if x == 0: return '0' #Special case: Don't strip leading zero if no other digits elif x < 0: sign='-' x*=-1 else: sign = '' l=[] while x: l.append(_conv_table[x & 0xff]) x >>= 8 return sign + ''.join(reversed(l)).lstrip("0") </code></pre> [Edit] Changed code to handle signed integers. [Edit2] Here are some timing figures of the various solutions. bin is the function above, constantin_bin is from Constantin's answer and num_bin is the original version. Out of curiosity, I also tried a 16 bit lookup table variant of the above (bin16 below), and tried out Python3's builtin bin() function. All timings were for 100000 runs using an 01010101 bit pattern. <pre class="prettyprint"><code>Num Bits: 8 16 32 64 128 256 --------------------------------------------------------------------- bin 0.544 0.586 0.744 1.942 1.854 3.357 bin16 0.542 0.494 0.592 0.773 1.150 1.886 constantin_bin 2.238 3.803 7.794 17.869 34.636 94.799 num_bin 3.712 5.693 12.086 32.566 67.523 128.565 Python3's bin 0.079 0.045 0.062 0.069 0.212 0.201 </code></pre> As you can see, when processing long values using large chunks really pays off, but nothing beats the low-level C code of python3's builtin (which bizarrely seems consistently faster at 256 bits than 128!). Using a 16 bit lookup table improves things, but probably isn't worth it unless you really need it, as it uses up a large chunk of memory, and can introduce a small but noticalbe startup delay to precompute the table.

Base-2 (Binary) Representation Using Python

Tags:

python

Building on How Do You Express Binary Literals in Python, I was thinking about sensible, intuitive ways to do that Programming 101 chestnut of displaying integers in base-2 form. This is the best I came up with, but I'd like to replace it with a better algorithm, or at least one that should have screaming-fast performance.

def num_bin(N, places=8):
    def bit_at_p(N, p):
        ''' find the bit at place p for number n '''
        two_p = 1 << p   # 2 ^ p, using bitshift, will have exactly one
                         # bit set, at place p
        x = N & two_p    # binary composition, will be one where *both* numbers
                         # have a 1 at that bit.  this can only happen 
                         # at position p.  will yield  two_p if  N has a 1 at 
                         # bit p
        return int(x > 0)

    bits =  ( bit_at_p(N,x) for x in xrange(places))
    return "".join( (str(x) for x in bits) )

    # or, more consisely 
    # return "".join([str(int((N & 1 << x)>0)) for x in xrange(places)])

463

asked Oct 09 '08 13:10

Gregg Lind

1 Answers

For best efficiency, you generally want to process more than a single bit at a time. You can use a simple method to get a fixed width binary representation. eg.

def _bin(x, width):
    return ''.join(str((x>>i)&1) for i in xrange(width-1,-1,-1))

_bin(x, 8) will now give a zero padded representation of x's lower 8 bits. This can be used to build a lookup table, allowing your converter to process 8 bits at a time (or more if you want to devote the memory to it).

_conv_table = [_bin(x,8) for x in range(256)]

Then you can use this in your real function, stripping off leading zeroes when returning it. I've also added handling for signed numbers, as without it you will get an infinite loop (Negative integers conceptually have an infinite number of set sign bits.)

def bin(x):
    if x == 0: 
        return '0' #Special case: Don't strip leading zero if no other digits
    elif x < 0:
        sign='-'
        x*=-1
    else:
        sign = ''
    l=[]
    while x:
        l.append(_conv_table[x & 0xff])
        x >>= 8
    return sign + ''.join(reversed(l)).lstrip("0")

[Edit] Changed code to handle signed integers.
[Edit2] Here are some timing figures of the various solutions. bin is the function above, constantin_bin is from Constantin's answer and num_bin is the original version. Out of curiosity, I also tried a 16 bit lookup table variant of the above (bin16 below), and tried out Python3's builtin bin() function. All timings were for 100000 runs using an 01010101 bit pattern.

Num Bits:              8       16       32       64      128      256
---------------------------------------------------------------------
bin                0.544    0.586    0.744    1.942    1.854    3.357 
bin16              0.542    0.494    0.592    0.773    1.150    1.886
constantin_bin     2.238    3.803    7.794   17.869   34.636   94.799
num_bin            3.712    5.693   12.086   32.566   67.523  128.565
Python3's bin      0.079    0.045    0.062    0.069    0.212    0.201

As you can see, when processing long values using large chunks really pays off, but nothing beats the low-level C code of python3's builtin (which bizarrely seems consistently faster at 256 bits than 128!). Using a 16 bit lookup table improves things, but probably isn't worth it unless you really need it, as it uses up a large chunk of memory, and can introduce a small but noticalbe startup delay to precompute the table.

152

answered Sep 19 '22 01:09

Brian

Related questions
                            
                                WRITE only first N rows from pandas df to csv
                            
                                UnboundLocalError: local variable 'batch_index' referenced before assignment
                            
                                No module named 'pmdarima'
                            
                                Kernel error (Errno 13 Permission denied) in Jupyter Notebook, Windows 10
                            
                                Fully convert a black and white image to a set of lines (aka vectorize using only lines)
                            
                                Matplotlib savefig background always transparent
                            
                                Dummify categorical variables for logistic regression with pandas and scikit (OneHotEncoder)
                            
                                How to write a case when like statement in numpy array
                            
                                dict.pop versus dict.get on the default return value
                            
                                Can I have two shebang lines, one for python and one for python3?
                            
                                How to create a new column based on values from other columns in a Pandas DataFrame
                            
                                Check if List is not empty with Pydantic in an elegant way
                            
                                How to get the endpoint of a LineString in Shapely
                            
                                Python + Selenium - How to check an image which is styled with CSS and displayed as content?
                            
                                Running a ProcessPoolExecutor in IPython
                            
                                Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output while installing mysqlclient [duplicate]
                            
                                zsh: /usr/local/bin/pipenv: bad interpreter: /usr/local/opt/python/bin/python3.7: no such file or directory
                            
                                No definition found for function - VSCode Python
                            
                                How to install from requirements.txt?
                            
                                How can I find the k-th largest element in an exponentially large list?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With