Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Two's Complement Binary in Python?

Integers in Python are stored in two's complement, correct?

Although:

>>> x = 5 >>> bin(x) 0b101 

And:

>>> x = -5 >>> bin(x) -0b101 

That's pretty lame. How do I get python to give me the numbers in REAL binary bits, and without the 0b infront of it? So:

>>> x = 5 >>> bin(x) 0101 >>> y = -5 >>> bin(y) 1011 
like image 937
Thor Correia Avatar asked Oct 18 '12 02:10

Thor Correia


People also ask

How do you get 2's complement of a binary number in Python?

The way to compute the Two's complement is by taking the One's complement and adding 1 . 1111 1100 would then be -12 in Two's complement. The advantage over One's complement is that there isn't a duplicate representation of 0 with 0000 0000 and 1111 1111 , hence allowing one more value to be represented -128 .

How do you do two's complement in binary?

To get 2's complement of binary number is 1's complement of given number plus 1 to the least significant bit (LSB). For example 2's complement of binary number 10010 is (01101) + 1 = 01110.


2 Answers

It works best if you provide a mask. That way you specify how far to sign extend.

>>> bin(-27 & 0b1111111111111111) '0b1111111111100101' 

Or perhaps more generally:

def bindigits(n, bits):     s = bin(n & int("1"*bits, 2))[2:]     return ("{0:0>%s}" % (bits)).format(s)  >>> print bindigits(-31337, 24) 111111111000010110010111 

In basic theory, the actual width of the number is a function of the size of the storage. If it's a 32-bit number, then a negative number has a 1 in the MSB of a set of 32. If it's a 64-bit value, then there are 64 bits to display.

But in Python, integer precision is limited only to the constraints of your hardware. On my computer, this actually works, but it consumes 9GB of RAM just to store the value of x. Anything higher and I get a MemoryError. If I had more RAM, I could store larger numbers.

>>> x = 1 << (1 << 36) 

So with that in mind, what binary number represents -1? Python is well-capable of interpreting literally millions (and even billions) of bits of precision, as the previous example shows. In 2's complement, the sign bit extends all the way to the left, but in Python there is no pre-defined number of bits; there are as many as you need.

But then you run into ambiguity: does binary 1 represent 1, or -1? Well, it could be either. Does 111 represent 7 or -1? Again, it could be either. So does 111111111 represent 511, or -1... well, both, depending on your precision.

Python needs a way to represent these numbers in binary so that there's no ambiguity of their meaning. The 0b prefix just says "this number is in binary". Just like 0x means "this number is in hex". So if I say 0b1111, how do I know if the user wants -1 or 15? There are two options:

Option A: The sign bit
You could declare that all numbers are signed, and the left-most bit is the sign bit. That means 0b1 is -1, while 0b01 is 1. That also means that 0b111 is also -1, while 0b0111 is 7. In the end, this is probably more confusing than helpful particularly because most binary arithmetic is going to be unsigned anyway, and people are more likely to run into mistakes by accidentally marking a number as negative because they didn't include an explicit sign bit.

Option B: The sign indication
With this option, binary numbers are represented unsigned, and negative numbers have a "-" prefix, just like they do in decimal. This is (a) more consistent with decimal, (b) more compatible with the way binary values are most likely going to be used. You lose the ability to specify a negative number using its two's complement representation, but remember that two's complement is a storage implementation detail, not a proper indication of the underlying value itself. It shouldn't have to be something that the user has to understand.

In the end, Option B makes the most sense. There's less confusion and the user isn't required to understand the storage details.

like image 61
tylerl Avatar answered Sep 21 '22 13:09

tylerl


To properly interpret a binary sequence as two's complement, there needs to a length associated with the sequence. When you are working low-level types that correspond directly to CPU registers, there is an implicit length. Since Python integers can have an arbitrary length, there really isn't an internal two's complement format. Since there isn't a length associated with a number, there is no way to distinguish between positive and negative numbers. To remove the ambiguity, bin() includes a minus sign when formatting a negative number.

Python's arbitrary length integer type actually uses a sign-magnitude internal format. The logical operations (bit shifting, and, or, etc.) are designed to mimic two's complement format. This is typical of multiple precision libraries.

like image 42
casevh Avatar answered Sep 24 '22 13:09

casevh