Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert string like '001100' to numpy.array([0,0,1,1,0,0]) quickly?

I have a string consists of 0 and 1, like '00101'. And I want to convert it to numpy array numpy.array([0,0,1,0,1].

I am using for loop like:

import numpy as np
X = np.zeros((1,5),int)
S = '00101'
for i in xrange(5):
    X[0][i] = int(S[i])

But since I have many strings and the length of each string is 1024, this way is very slow. Is there any better way to do this?

like image 972
stigmj Avatar asked Aug 22 '15 10:08

stigmj


People also ask

How do you convert a string to an array in Python?

To convert String to array in Python, use String. split() method. The String . split() method splits the String from the delimiter and returns the splitter elements as individual list items.

Can we use strings in NumPy array?

Starting from numpy 1.4, if one needs arrays of strings, it is recommended to use arrays of dtype object_ , string_ or unicode_ , and use the free functions in the numpy. char module for fast vectorized string operations.

What is the syntax for converting NumPy array to the list?

We can use NumPy np. array tolist() function to convert an array to a list.


1 Answers

map should be a bit faster than a list comp:

import  numpy as np

arr = np.array(map(int,'00101'))

Some timings show it is on a string of 1024 chars:

In [12]: timeit np.array([int(c) for c in s])
1000 loops, best of 3: 422 µs per loop

In [13]: timeit np.array(map(int,s))
1000 loops, best of 3: 389 µs per loop

Just calling list in s and using dtype=int is faster:

In [20]: timeit np.array(list(s), dtype=int)
1000 loops, best of 3: 329 µs per loop

Using fromiter and passing dtype=int is faster again:

In [21]: timeit  np.fromiter(s,dtype=int)
1000 loops, best of 3: 289 µs per loop

Borrowing from this answer, using fromstring and uint8 as the dtype is the fastest:

In [54]: timeit  np.fromstring(s, 'int8') - 48
100000 loops, best of 3: 4.54 µs per loop

Even rebinding the name and changing the dtype is still by far the fastest:

In [71]: %%timeit
   ....: arr = np.fromstring(s, 'int8') - 48
   ....: arr = arr.astype(int)
   ....: 
100000 loops, best of 3: 6.23 µs per loop

Even considerably faster than Ashwini's join:

In [76]: timeit  np.fromstring(' '.join(s), sep=' ', dtype=int)
10000 loops, best of 3: 62.6 µs per loop

As @Unutbu commented out,np.fromstring(s, 'int8') - 48 is not limited to ones and zeros but will work for all strings composed of ASCII digits.

like image 86
Padraic Cunningham Avatar answered Sep 29 '22 16:09

Padraic Cunningham