Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient way to drop a column from a Numpy array?

If I have a very large numpy array with one useless column, how could I drop it without creating a copy of the original array?

np.delete(my_np_array, 0, 1)

The above code will return a copy of the array without the zero-th column. But instead I would like to simply delete that column from my_np_array since I don't need it. For very large datasets, the memory management becomes important and copying may not be an option.

like image 201
Krishan Gupta Avatar asked Dec 14 '13 07:12

Krishan Gupta


People also ask

How do I remove the last column from an NP array?

arange() function and within this function pass integer value '12' in it along with shape and size. Now we want to delete the last column from the given array. By using the numpy. delete() function we can easily delete the last column by assigning axis and object as an argument.

How do you select a specific element in a NumPy array?

To select an element from Numpy Array , we can use [] operator i.e. It will return the element at given index only.

Why NumPy array operations are faster?

NumPy Arrays are faster than Python Lists because of the following reasons: An array is a collection of homogeneous data-types that are stored in contiguous memory locations. On the other hand, a list in Python is a collection of heterogeneous data types stored in non-contiguous memory locations.


2 Answers

If memory is the main concern, what you can do is move columns around within your array such that the unneeded column gets at the very end of your array, then use ndarray.resize, which modifies he array in-place, to shrink it down and discard the outer column.

You cannot simply remove the first column of an array in-place using the provided API, and I suspect it is because of the memory layout of an ndarray that maps multidimensional indexing to unidimensional byte-oriented addressing within blocks of contiguous memory.

The following example copies the last column into the first and then deletes the last (now unneeded), immediately purging the associated memory. So it basically removes the obsolete column from memory completely, at the cost of changing your column order.

D1, D2 = A.shape
A[:, 0] = A[:, D2-1] 
A.resize((D1, D2-1), refcheck=False)
A.shape  
# => would be (5, 4) if the shape was initially (5, 5) for example
like image 135
matehat Avatar answered Oct 20 '22 00:10

matehat


If you use slicing numpy won't make a copy; in other words

a = numpy.array([1, 2, 3, 4, 5])
b = a[1:]  # view elements from second to last, NOT making a copy
b[0] = 12  # Change first element of `b`, i.e. second of `a`
print a

will reply [1, 12, 3, 4, 5]

If you need to delete an element in the middle however a single slicing won't work.

like image 26
6502 Avatar answered Oct 19 '22 23:10

6502