I am using Numpy to store data into matrices. Coming from R background, there has been an extremely simple way to apply a function over row/columns or both of a matrix.
Is there something similar for python/numpy combination? It's not a problem to write my own little implementation but it seems to me that most of the versions I come up with will be significantly less efficient/more memory intensive than any of the existing implementation.
I would like to avoid copying from the numpy matrix to a local variable etc., is that possible?
The functions I am trying to implement are mainly simple comparisons (e.g. how many elements of a certain column are smaller than number x or how many of them have absolute value larger than y).
Almost all numpy functions operate on whole arrays, and/or can be told to operate on a particular axis (row or column). As long as you can define your function in terms of numpy functions acting on numpy arrays or array slices, your function will automatically operate on whole arrays, rows or columns.
You can use np.apply_along_axis: np.apply_along_axis (function, 1, array) The first argument is the function, the second argument is the axis along which the function is to be applied. In your case, it is the first axis.
The following code shows how to map a function to a NumPy array that multiplies each value by 2 and then adds 5: import numpy as np #create NumPy array data = np.array( [1, 3, 4, 4, 7, 8, 13, 15]) #define function my_function = lambda x: x*2+5 #apply function to NumPy array my_function (data) array ( [ 7, 11, 13, 13, 19, 21, 31, 35]) And so on.
There are two ways to create matrices in numpy. The most common one is to use the numpy ndarray class. Here we create two-dimensional numpy arrays (ndarray objects). The other one is to use the numpy matrix class. Here we create matrix objects. The dot product of both ndarray and matrix objects can be obtained using np.dot ().
Almost all numpy functions operate on whole arrays, and/or can be told to operate on a particular axis (row or column).
As long as you can define your function in terms of numpy functions acting on numpy arrays or array slices, your function will automatically operate on whole arrays, rows or columns.
It may be more helpful to ask about how to implement a particular function to get more concrete advice.
Numpy provides np.vectorize and np.frompyfunc to turn Python functions which operate on numbers into functions that operate on numpy arrays.
For example,
def myfunc(a,b): if (a>b): return a else: return b vecfunc = np.vectorize(myfunc) result=vecfunc([[1,2,3],[5,6,9]],[7,4,5]) print(result) # [[7 4 5] # [7 6 9]]
(The elements of the first array get replaced by the corresponding element of the second array when the second is bigger.)
But don't get too excited; np.vectorize
and np.frompyfunc
are just syntactic sugar. They don't actually make your code any faster. If your underlying Python function is operating on one value at a time, then np.vectorize
will feed it one item at a time, and the whole operation is going to be pretty slow (compared to using a numpy function which calls some underlying C or Fortran implementation).
To count how many elements of column x
are smaller than a number y
, you could use an expression such as:
(array['x']<y).sum()
For example:
import numpy as np array=np.arange(6).view([('x',np.int),('y',np.int)]) print(array) # [(0, 1) (2, 3) (4, 5)] print(array['x']) # [0 2 4] print(array['x']<3) # [ True True False] print((array['x']<3).sum()) # 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With