br
is the name of a list of strings that goes like this:
['14 0.000000 -- (long term 0.000000)\n',
'19 0.000000 -- (long term 0.000000)\n',
'22 0.000000 -- (long term 0.000000)\n',
...
I am interested in the first two columns, which I would like to convert to a numpy array. So far, I've come up with the following solution:
x = N.array ([0., 0.])
for i in br:
x = N.vstack ( (x, N.array (map (float, i.split ()[:2]))) )
This results into having a 2-D array:
array([[ 0., 0.],
[ 14., 0.],
[ 19., 0.],
[ 22., 0.],
...
However, since br
is rather big (~10^5 entries), this procedure takes some time.
I was wondering, is there a way to accomplish the same result, but in less time?
By explicitly declaring the "ndarray" data type, your array processing can be 1250x faster. This tutorial will show you how to speed up the processing of NumPy arrays using Cython. By explicitly specifying the data types of variables in Python, Cython can give drastic speed increases at runtime.
NumPy Arrays are faster than Python Lists because of the following reasons: An array is a collection of homogeneous data-types that are stored in contiguous memory locations. On the other hand, a list in Python is a collection of heterogeneous data types stored in non-contiguous memory locations.
This is dramatically faster for me:
import numpy as N
br = ['14 0.000000 -- (long term 0.000000)\n']*50000
aa = N.zeros((len(br), 2))
for i,line in enumerate(br):
al, strs = aa[i], line.split(None, 2)[:2]
al[0], al[1] = float(strs[0]), float(strs[1])
Changes:
You can try to preprocess (with awk for exemple) the list of strings if they come from a file, and use numpy.fromtxt. If you can't do anything about the way you get this list, you have several possibilities:
edit
maybe this approach is slightly faster:
def conv(mysrt):
return map(float, mystr.split()[:2])
br_float = map(conv, br)
x = N.array(br_float)
Changing
map (float, i.split()[:2])
to
map (float, i.split(' ',2)[:2])
might result in a slight speedup. Since you only care about first two space-separated items in each line there is no need to split the entire line. The 2
in i.split(' ',2)
tells split
to just make a maximum of 2 splits. For example,
In [11]: x='14 0.000000 -- (long term 0.000000)\n'
In [12]: x.split()
Out[12]: ['14', '0.000000', '--', '(long', 'term', '0.000000)']
In [13]: x.split(' ',2)
Out[13]: ['14', '0.000000', '-- (long term 0.000000)\n']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With