Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient way to convert delimiter separated string to numpy array

Tags:

python

numpy

I have a String as follows :

1|234|4456|789

I have to convert it into numpy array.I would like to know the most efficient way.Since I will be calling this function for more than 50 million times!

like image 900
Sree Aurovindh Avatar asked Mar 22 '12 03:03

Sree Aurovindh


People also ask

How do you convert a comma separated string to a NumPy array?

You can split it to a list and then into a numpy. array, y = np. array(lineDecoded. split(",")) .

How do I split a column into NumPy array?

hsplit() function. The hsplit() function is used to split an array into multiple sub-arrays horizontally (column-wise). hsplit is equivalent to split with axis=1, the array is always split along the second axis regardless of the array dimension.

How do you split the element of a given NumPy array with spaces?

To split the elements of a given array with spaces we will use numpy. char. split(). It is a function for doing string operations in NumPy.

What is array manipulation in NumPy?

Data manipulation in Python is nearly synonymous with NumPy array manipulation: even newer tools like Pandas (Chapter 3) are built around the NumPy array. This section will present several examples of using NumPy array manipulation to access data and subarrays, and to split, reshape, and join the arrays.


2 Answers

@jterrace wins one (1) internet.

In the measurements below the example code has been shortened to allow the tests to fit on one line without scrolling where possible.

For those not familiar with timeit the -s flag allows you to specify a bit of code which will only be executed once.


The fastest and least-cluttered way is to use numpy.fromstring as jterrace suggested:

python -mtimeit -s"import numpy;s='1|2'" "numpy.fromstring(s,dtype=int,sep='|')"
100000 loops, best of 3: 1.85 usec per loop

The following three examples use string.split in combination with another tool.

string.split with numpy.fromiter

python -mtimeit -s"import numpy;s='1|2'" "numpy.fromiter(s.split('|'),dtype=int)"
100000 loops, best of 3: 2.24 usec per loop

string.split with int() cast via generator-expression

python -mtimeit -s"import numpy;s='1|2'" "numpy.array(int(x) for x in s.split('|'))"
100000 loops, best of 3: 3.12 usec per loop

string.split with NumPy array of type int

python -mtimeit -s"import numpy;s='1|2'" "numpy.array(s.split('|'),dtype=int)"
100000 loops, best of 3: 9.22 usec per loop
like image 22
mechanical_meat Avatar answered Oct 13 '22 00:10

mechanical_meat


The fastest way is to use the numpy.fromstring method:

>>> import numpy
>>> data = "1|234|4456|789"
>>> numpy.fromstring(data, dtype=int, sep="|")
array([   1,  234, 4456,  789])
like image 163
jterrace Avatar answered Oct 13 '22 00:10

jterrace