Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert Bitstring (String of 1 and 0s) to numpy array

I have a pandas Dataframe containing 1 columns which contains a string of bits eg.'100100101'. i want to convert this string into an numpy array.

How can I do that?

EDIT:

Using

features = df.bit.apply(lambda x: np.array(list(map(int,list(x)))))
#...
model.fit(features, lables)

leads to an error on model.fit:

ValueError: setting an array element with a sequence.

The Solution that works for my case i came up with due to marked answer:

for bitString in input_table['Bitstring'].values:
    bits = np.array(map(int, list(bitString)))
    featureList.append(bits)
features = np.array(featureList)
#....
model.fit(features, lables)
like image 698
beginner_ Avatar asked Mar 17 '15 05:03

beginner_


People also ask

What does [: :] mean on NumPy arrays?

The [:, :] stands for everything from the beginning to the end just like for lists. The difference is that the first : stands for first and the second : for the second dimension. a = numpy. zeros((3, 3)) In [132]: a Out[132]: array([[ 0., 0., 0.], [ 0., 0., 0.], [ 0., 0., 0.]])

Can we use strings in NumPy array?

Starting from numpy 1.4, if one needs arrays of strings, it is recommended to use arrays of dtype object_ , string_ or unicode_ , and use the free functions in the numpy. char module for fast vectorized string operations.

How to convert a NumPy array to a string?

Let’s look at a few ways to convert a numpy array to a string. We will see how to do it in both Numpy and Python-specific ways. The easiest way to convert a Numpy array to a string is to use the Numpy array2string dedicated function.

How do I convert a binary string to an integer?

A simple way to convert binary string data like the one you have is to use the built-in int () function and tell it the number is in base 2 binary instead of the default base 10 decimal format: This will return a an integer value.

How many bytes do I need to send a string?

I have a string representation of binary integers and I need bytes having the exact bit structure, to send over the sockets. For e.g. if I have a string of length 16 : 0000111100001010 then I need 2 bytes of same bit structure. In this case, the first byte should have an int value of 15 and the second one as 10.

How to write a bit string as a literal value?

First, if you have the bit string as a literal value, just make it a base-2 int literal, instead of a string literal:


2 Answers

For a string s = "100100101", you can convert it to a numpy array at least two different ways.

The first by using numpy's fromstring method. It is a bit awkward, because you have to specify the datatype and subtract out the "base" value of the elements.

import numpy as np

s = "100100101"
a = np.fromstring(s,'u1') - ord('0')

print a  # [1 0 0 1 0 0 1 0 1]

Where 'u1' is the datatype and ord('0') is used to subtract the "base" value from each element.

The second way is by converting each string element to an integer (since strings are iterable), then passing that list into np.array:

import numpy as np

s = "100100101"
b = np.array(map(int, s))

print b  # [1 0 0 1 0 0 1 0 1]

Then

# To see its a numpy array:
print type(a)  # <type 'numpy.ndarray'>
print a[0]     # 1
print a[1]     # 0
# ...

Note the second approach scales significantly worse than the first as the length of the input string s increases. For small strings, it's close, but consider the timeit results for strings of 90 characters (I just used s * 10):

fromstring: 49.283392424 s
map/array:   2.154540959 s

(This is using the default timeit.repeat arguments, the minimum of 3 runs, each run computing the time to run 1M string->array conversions)

like image 95
jedwards Avatar answered Sep 20 '22 04:09

jedwards


One pandas method would be to call apply on the df column to perform the conversion:

In [84]:

df = pd.DataFrame({'bit':['100100101']})
t = df.bit.apply(lambda x: np.array(list(map(int,list(x)))))
t[0]
Out[84]:
array([1, 0, 0, 1, 0, 0, 1, 0, 1])
like image 27
EdChum Avatar answered Sep 21 '22 04:09

EdChum