Given a structured numpy array, I want to remove certain columns by name without copying the array. I know I can do this:
names = list(a.dtype.names)
if name_to_remove in names:
names.remove(name_to_remove)
a = a[names]
But this creates a temporary copy of the array which I want to avoid because the array I am dealing with might be very large.
Is there a good way to do this?
Using the NumPy function np. delete() , you can delete any row and column from the NumPy array ndarray . Specify the axis (dimension) and position (row number, column number, etc.). It is also possible to select multiple rows and columns using a slice or a list.
Many times we have non-numeric values in NumPy array. These values need to be removed, so that array will be free from all these unnecessary values and look more decent. It is possible to remove all columns containing Nan values using the Bitwise NOT operator and np. isnan() function.
To delete multiple elements from a numpy array by index positions, pass the numpy array and list of index positions to be deleted to np. delete() i.e. It deleted the elements at index position 1,2 and 3 from the numpy array. It returned a copy of the passed array by deleting multiple element at given indices.
We can use [][] operator to select an element from Numpy Array i.e. Example 1: Select the element at row index 1 and column index 2. Or we can pass the comma separated list of indices representing row index & column index too i.e.
You can create a new data type containing just the fields that you want, with the same field offsets and the same itemsize as the original array's data type, and then use this new data type to create a view of the original array. The dtype
function handles arguments with many formats; the relevant one is described in the section of the documentation called "Specifying and constructing data types". Scroll down to the subsection that begins with
{'names': ..., 'formats': ..., 'offsets': ..., 'titles': ..., 'itemsize': ...}
Here are a couple convenience functions that use this idea.
import numpy as np
def view_fields(a, names):
"""
`a` must be a numpy structured array.
`names` is the collection of field names to keep.
Returns a view of the array `a` (not a copy).
"""
dt = a.dtype
formats = [dt.fields[name][0] for name in names]
offsets = [dt.fields[name][1] for name in names]
itemsize = a.dtype.itemsize
newdt = np.dtype(dict(names=names,
formats=formats,
offsets=offsets,
itemsize=itemsize))
b = a.view(newdt)
return b
def remove_fields(a, names):
"""
`a` must be a numpy structured array.
`names` is the collection of field names to remove.
Returns a view of the array `a` (not a copy).
"""
dt = a.dtype
keep_names = [name for name in dt.names if name not in names]
return view_fields(a, keep_names)
For example,
In [297]: a
Out[297]:
array([(10.0, 13.5, 1248, -2), (20.0, 0.0, 0, 0), (30.0, 0.0, 0, 0),
(40.0, 0.0, 0, 0), (50.0, 0.0, 0, 999)],
dtype=[('x', '<f8'), ('y', '<f8'), ('i', '<i8'), ('j', '<i8')])
In [298]: b = remove_fields(a, ['i', 'j'])
In [299]: b
Out[299]:
array([(10.0, 13.5), (20.0, 0.0), (30.0, 0.0), (40.0, 0.0), (50.0, 0.0)],
dtype={'names':['x','y'], 'formats':['<f8','<f8'], 'offsets':[0,8], 'itemsize':32})
Verify that b
is a view (not a copy) of a
by changing b[0]['x']
...
In [300]: b[0]['x'] = 3.14
and seeing that a
is also changed:
In [301]: a[0]
Out[301]: (3.14, 13.5, 1248, -2)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With