I have a pandas Dataframe containing 1 columns which contains a string of bits eg.<code>'100100101'</code>. i want to convert this string into an numpy array. How can I do that? EDIT: Using <pre class="prettyprint"><code>features = df.bit.apply(lambda x: np.array(list(map(int,list(x))))) #... model.fit(features, lables) </code></pre> leads to an error on <code>model.fit</code>: <pre class="prettyprint"><code>ValueError: setting an array element with a sequence. </code></pre> The Solution that works for my case i came up with due to marked answer: <pre class="prettyprint"><code>for bitString in input_table['Bitstring'].values: bits = np.array(map(int, list(bitString))) featureList.append(bits) features = np.array(featureList) #.... model.fit(features, lables) </code></pre>

For a string <code>s = "100100101"</code>, you can convert it to a numpy array at least two different ways. The first by using numpy's <code>fromstring</code> method. It is a bit awkward, because you have to specify the datatype and subtract out the "base" value of the elements. <pre class="prettyprint"><code>import numpy as np s = "100100101" a = np.fromstring(s,'u1') - ord('0') print a # [1 0 0 1 0 0 1 0 1] </code></pre> Where <code>'u1'</code> is the datatype and <code>ord('0')</code> is used to subtract the "base" value from each element. The second way is by converting each string element to an integer (since strings are iterable), then passing that list into <code>np.array</code>: <pre class="prettyprint"><code>import numpy as np s = "100100101" b = np.array(map(int, s)) print b # [1 0 0 1 0 0 1 0 1] </code></pre> Then <pre class="prettyprint"><code># To see its a numpy array: print type(a) # <type 'numpy.ndarray'> print a[0] # 1 print a[1] # 0 # ... </code></pre> Note the second approach scales significantly worse than the first as the length of the input string <code>s</code> increases. For small strings, it's close, but consider the <code>timeit</code> results for strings of 90 characters (I just used <code>s * 10</code>): <pre class="prettyprint"><code>fromstring: 49.283392424 s map/array: 2.154540959 s </code></pre> (This is using the default <code>timeit.repeat</code> arguments, the minimum of 3 runs, each run computing the time to run 1M string->array conversions)

One pandas method would be to call apply on the df column to perform the conversion: <pre class="prettyprint"><code>In [84]: df = pd.DataFrame({'bit':['100100101']}) t = df.bit.apply(lambda x: np.array(list(map(int,list(x))))) t[0] Out[84]: array([1, 0, 0, 1, 0, 0, 1, 0, 1]) </code></pre>

Convert Bitstring (String of 1 and 0s) to numpy array

Tags:

python

pandas

numpy

bitstring

I have a pandas Dataframe containing 1 columns which contains a string of bits eg.'100100101'. i want to convert this string into an numpy array.

How can I do that?

EDIT:

Using

features = df.bit.apply(lambda x: np.array(list(map(int,list(x)))))
#...
model.fit(features, lables)

leads to an error on model.fit:

ValueError: setting an array element with a sequence.

The Solution that works for my case i came up with due to marked answer:

for bitString in input_table['Bitstring'].values:
    bits = np.array(map(int, list(bitString)))
    featureList.append(bits)
features = np.array(featureList)
#....
model.fit(features, lables)

698

asked Mar 17 '15 05:03

beginner_

2 Answers

For a string s = "100100101", you can convert it to a numpy array at least two different ways.

The first by using numpy's fromstring method. It is a bit awkward, because you have to specify the datatype and subtract out the "base" value of the elements.

import numpy as np

s = "100100101"
a = np.fromstring(s,'u1') - ord('0')

print a  # [1 0 0 1 0 0 1 0 1]

Where 'u1' is the datatype and ord('0') is used to subtract the "base" value from each element.

The second way is by converting each string element to an integer (since strings are iterable), then passing that list into np.array:

import numpy as np

s = "100100101"
b = np.array(map(int, s))

print b  # [1 0 0 1 0 0 1 0 1]

Then

# To see its a numpy array:
print type(a)  # <type 'numpy.ndarray'>
print a[0]     # 1
print a[1]     # 0
# ...

Note the second approach scales significantly worse than the first as the length of the input string s increases. For small strings, it's close, but consider the timeit results for strings of 90 characters (I just used s * 10):

fromstring: 49.283392424 s
map/array:   2.154540959 s

(This is using the default timeit.repeat arguments, the minimum of 3 runs, each run computing the time to run 1M string->array conversions)

answered Sep 20 '22 04:09

jedwards

One pandas method would be to call apply on the df column to perform the conversion:

In [84]:

df = pd.DataFrame({'bit':['100100101']})
t = df.bit.apply(lambda x: np.array(list(map(int,list(x)))))
t[0]
Out[84]:
array([1, 0, 0, 1, 0, 0, 1, 0, 1])

answered Sep 21 '22 04:09

EdChum

Related questions
                            
                                Why is type(bytes()) <'str'>
                            
                                Insert Values from dictionary into sqlite database
                            
                                Sklearn's MinMaxScaler only returns zeros
                            
                                Why does Pandas iterate over DataFrame columns by default?
                            
                                Restoring the default display context in Pandas
                            
                                Click button on website then scrape web page
                            
                                Read sparse matrix in python
                            
                                Pymongo using $exists
                            
                                python make RGB image from 3 float32 numpy arrays
                            
                                Plot multiple boxplot in one graph in pandas or matplotlib?
                            
                                AttributeError: 'Pool' object has no attribute '__exit__'
                            
                                Python QuickSort maximum recursion depth
                            
                                Printing lists in python without spaces
                            
                                Python: How to find two equal/closest values between two separate arrays?
                            
                                Sympy Simplification with Square Root
                            
                                How to convert a dictionary into a flat list?
                            
                                selenium move_to_element does not always mouse-hover
                            
                                Python: Munging data with '.join' (TypeError: sequence item 0: expected string, tuple found)
                            
                                How do I inspect one specific object in IPython
                            
                                Visualize Optical Flow with color model

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With