I have a pandas Dataframe containing 1 columns which contains a string of bits eg.'100100101'
. i want to convert this string into an numpy array.
How can I do that?
EDIT:
Using
features = df.bit.apply(lambda x: np.array(list(map(int,list(x)))))
#...
model.fit(features, lables)
leads to an error on model.fit
:
ValueError: setting an array element with a sequence.
The Solution that works for my case i came up with due to marked answer:
for bitString in input_table['Bitstring'].values:
bits = np.array(map(int, list(bitString)))
featureList.append(bits)
features = np.array(featureList)
#....
model.fit(features, lables)
The [:, :] stands for everything from the beginning to the end just like for lists. The difference is that the first : stands for first and the second : for the second dimension. a = numpy. zeros((3, 3)) In [132]: a Out[132]: array([[ 0., 0., 0.], [ 0., 0., 0.], [ 0., 0., 0.]])
Starting from numpy 1.4, if one needs arrays of strings, it is recommended to use arrays of dtype object_ , string_ or unicode_ , and use the free functions in the numpy. char module for fast vectorized string operations.
Let’s look at a few ways to convert a numpy array to a string. We will see how to do it in both Numpy and Python-specific ways. The easiest way to convert a Numpy array to a string is to use the Numpy array2string dedicated function.
A simple way to convert binary string data like the one you have is to use the built-in int () function and tell it the number is in base 2 binary instead of the default base 10 decimal format: This will return a an integer value.
I have a string representation of binary integers and I need bytes having the exact bit structure, to send over the sockets. For e.g. if I have a string of length 16 : 0000111100001010 then I need 2 bytes of same bit structure. In this case, the first byte should have an int value of 15 and the second one as 10.
First, if you have the bit string as a literal value, just make it a base-2 int literal, instead of a string literal:
For a string s = "100100101"
, you can convert it to a numpy array at least two different ways.
The first by using numpy's fromstring
method. It is a bit awkward, because you have to specify the datatype and subtract out the "base" value of the elements.
import numpy as np
s = "100100101"
a = np.fromstring(s,'u1') - ord('0')
print a # [1 0 0 1 0 0 1 0 1]
Where 'u1'
is the datatype and ord('0')
is used to subtract the "base" value from each element.
The second way is by converting each string element to an integer (since strings are iterable), then passing that list into np.array
:
import numpy as np
s = "100100101"
b = np.array(map(int, s))
print b # [1 0 0 1 0 0 1 0 1]
Then
# To see its a numpy array:
print type(a) # <type 'numpy.ndarray'>
print a[0] # 1
print a[1] # 0
# ...
Note the second approach scales significantly worse than the first as the length of the input string s
increases. For small strings, it's close, but consider the timeit
results for strings of 90 characters (I just used s * 10
):
fromstring: 49.283392424 s
map/array: 2.154540959 s
(This is using the default timeit.repeat
arguments, the minimum of 3 runs, each run computing the time to run 1M string->array conversions)
One pandas method would be to call apply on the df column to perform the conversion:
In [84]:
df = pd.DataFrame({'bit':['100100101']})
t = df.bit.apply(lambda x: np.array(list(map(int,list(x)))))
t[0]
Out[84]:
array([1, 0, 0, 1, 0, 0, 1, 0, 1])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With