Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert binary string to list of integers using Python

Tags:

python

I am new to Python. Here's what I am trying to do:

  1. Slice a long binary string into 3 digit-long chunks.
  2. Store each "chunk" into a list called row.
  3. Convert each binary chunk into a number (0-7).
  4. Store the converted list of numbers into a new list called numbers.

Here is what I have so far:

def traverse(R):
        x = 0
        while x < (len(R) - 3):
            row = R[x] + R[x+1] + R[x+2]
            ???

Thanks for your help! It is greatly appreciated.

like image 240
AME Avatar asked Sep 06 '09 21:09

AME


2 Answers

Something like this should do it:

s = "110101001"
numbers = [int(s[i:i+3], 2) for i in range(0, len(s), 3)]
print numbers

The output is:

[6, 5, 1]

Breaking this down step by step, first:

>>> range(0, len(s), 3)
[0, 3, 6]

The range() function produces a list of integers from 0, less than the max len(s), by step 3.

>>> [s[i:i+3] for i in range(0, len(s), 3)]
["110", "101", "001"]

This is a list comprehension that evaluates s[i:i+3] for each i in the above range. The s[i:i+3] is a slice that selects a substring. Finally:

>>> [int(s[i:i+3], 2) for i in range(0, len(s), 3)]
[6, 5, 1]

The int(..., 2) function converts from binary (base 2, second argument) to integers.

Note that the above code may not properly handle error conditions like an input string that is not a multiple of 3 characters in length.

like image 73
Greg Hewgill Avatar answered Nov 16 '22 18:11

Greg Hewgill


I'll assume that by "binary string" you actually mean a normal string (i.e. text) whose items are all '0' or '1'.

So for points 1 and 2,

row = [thestring[i:i+3] for i in xrange(0, len(thestring), 3)]

of course the last item will be only 1 or 2 characters long if len(thestring) is not an exact multiple of 3, that's inevitable;-).

For points 3 and 4, I'd suggest building an auxiliary temp dictionary and storing it:

aux = {}
for x in range(8):
  s = format(x, 'b')
  aux[s] = x
  aux[('00'+s)[-3:]] = x

so that points 3 and 4 just become:

numbers = [aux[x] for x in row]

this dict lookup should be much faster than converting each entry on the fly.

Edit: it's been suggested I explain why am I making two entries into aux for each value of x. The point is that s may be of any length from 1 to 3 characters, and for the short lengths I do want two entries -- one with s as it it (because as I mentioned the last item in row may well be shorter than 3...), and one with it left-padded to a length of 3 with 0s.

The sub-expression ('00'+s)[-3:] computes "s left-padded with '0's to a length of 3" by taking the last 3 characters (that's the [-3:] slicing part) of the string obtained by placing zeros to the left of s (that's the '00'+s part). If s is already 3 characters long, the whole subexpression will equal s so the assignment to that entry of aux is useless but harmless, so I find it simpler to not even bother checking (prepending an if len(s)<3: would be fine too, matter of taste;-).

There are other approaches (e.g. formatting x again if needed) but this is hardly the crux of the code (it executes just 8 times to build up the auxiliary "lookup table", after all;-), so I didn't pay it enough attention.

...nor did I unit-test it, so it has a bug in one obscure corner case. Can you see it...?

Suppose row has '01' as the last entry: THAT key, after my code's above has built aux, will not be present in aux (both 1 and 001 WILL be, but that's scanty consolation;-). In the code above I use the original s, '1', and the length-three padded version, '001', but the intermediate length-two padded version, oops, got overlooked;-).

So, here's a RIGHT way to do it...:

aux = {}
for x in range(8):
  s = format(x, 'b')
  aux[s] = x
  while len(s) < 3:
    s = '0' + s
    aux[s] = x

...no doubt simpler and more obvious, but, even more importantly, CORRECT;-).

like image 7
Alex Martelli Avatar answered Nov 16 '22 17:11

Alex Martelli