Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy - the best way to remove the last element from 1 dimensional array?

What is the most efficient way to remove the last element from a numpy 1 dimensional array? (like pop for list)

like image 660
Meni Avatar asked Oct 04 '15 11:10

Meni


1 Answers

NumPy arrays have a fixed size, so you cannot remove an element in-place. For example using del doesn't work:

>>> import numpy as np >>> arr = np.arange(5) >>> del arr[-1] ValueError: cannot delete array elements 

Note that the index -1 represents the last element. That's because negative indices in Python (and NumPy) are counted from the end, so -1 is the last, -2 is the one before last and -len is actually the first element. That's just for your information in case you didn't know.

Python lists are variable sized so it's easy to add or remove elements.

So if you want to remove an element you need to create a new array or view.

Creating a new view

You can create a new view containing all elements except the last one using the slice notation:

>>> arr = np.arange(5) >>> arr array([0, 1, 2, 3, 4])  >>> arr[:-1]  # all but the last element array([0, 1, 2, 3]) >>> arr[:-2]  # all but the last two elements array([0, 1, 2]) >>> arr[1:]   # all but the first element array([1, 2, 3, 4]) >>> arr[1:-1] # all but the first and last element array([1, 2, 3]) 

However a view shares the data with the original array, so if one is modified so is the other:

>>> sub = arr[:-1] >>> sub array([0, 1, 2, 3]) >>> sub[0] = 100 >>> sub array([100,   1,   2,   3]) >>> arr array([100,   1,   2,   3,   4]) 

Creating a new array

1. Copy the view

If you don't like this memory sharing you have to create a new array, in this case it's probably simplest to create a view and then copy (for example using the copy() method of arrays) it:

>>> arr = np.arange(5) >>> arr array([0, 1, 2, 3, 4]) >>> sub_arr = arr[:-1].copy() >>> sub_arr array([0, 1, 2, 3]) >>> sub_arr[0] = 100 >>> sub_arr array([100,   1,   2,   3]) >>> arr array([0, 1, 2, 3, 4]) 

2. Using integer array indexing [docs]

However, you can also use integer array indexing to remove the last element and get a new array. This integer array indexing will always (not 100% sure there) create a copy and not a view:

>>> arr = np.arange(5) >>> arr array([0, 1, 2, 3, 4]) >>> indices_to_keep = [0, 1, 2, 3] >>> sub_arr = arr[indices_to_keep] >>> sub_arr array([0, 1, 2, 3]) >>> sub_arr[0] = 100 >>> sub_arr array([100,   1,   2,   3]) >>> arr array([0, 1, 2, 3, 4]) 

This integer array indexing can be useful to remove arbitrary elements from an array (which can be tricky or impossible when you want a view):

>>> arr = np.arange(5, 10) >>> arr array([5, 6, 7, 8, 9]) >>> arr[[0, 1, 3, 4]]  # keep first, second, fourth and fifth element array([5, 6, 8, 9]) 

If you want a generalized function that removes the last element using integer array indexing:

def remove_last_element(arr):     return arr[np.arange(arr.size - 1)] 

3. Using boolean array indexing [docs]

There is also boolean indexing that could be used, for example:

>>> arr = np.arange(5, 10) >>> arr array([5, 6, 7, 8, 9]) >>> keep = [True, True, True, True, False] >>> arr[keep] array([5, 6, 7, 8]) 

This also creates a copy! And a generalized approach could look like this:

def remove_last_element(arr):     if not arr.size:         raise IndexError('cannot remove last element of empty array')     keep = np.ones(arr.shape, dtype=bool)     keep[-1] = False     return arr[keep] 

If you would like more information on NumPys indexing the documentation on "Indexing" is quite good and covers a lot of cases.

4. Using np.delete()

Normally I wouldn't recommend the NumPy functions that "seem" like they are modifying the array in-place (like np.append and np.insert) but do return copies because these are generally needlessly slow and misleading. You should avoid them whenever possible, that's why it's the last point in my answer. However in this case it's actually a perfect fit so I have to mention it:

>>> arr = np.arange(10, 20) >>> arr array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]) >>> np.delete(arr, -1) array([10, 11, 12, 13, 14, 15, 16, 17, 18]) 

5.) Using np.resize()

NumPy has another method that sounds like it does an in-place operation but it really returns a new array:

>>> arr = np.arange(5) >>> arr array([0, 1, 2, 3, 4]) >>> np.resize(arr, arr.size - 1) array([0, 1, 2, 3]) 

To remove the last element I simply provided a new shape that is 1 smaller than before, which effectively removes the last element.

Modifying the array inplace

Yes, I've written previously that you cannot modify an array in place. But I said that because in most cases it's not possible or only by disabling some (completely useful) safety checks. I'm not sure about the internals but depending on the old size and the new size it could be possible that this includes an (internal-only) copy operation so it might be slower than creating a view.

Using np.ndarray.resize()

If the array doesn't share its memory with any other array, then it's possible to resize the array in place:

>>> arr = np.arange(5, 10) >>> arr.resize(4) >>> arr array([5, 6, 7, 8]) 

However that will throw ValueErrors in case it's actually referenced by another array as well:

>>> arr = np.arange(5) >>> view = arr[1:] >>> arr.resize(4) ValueError: cannot resize an array that references or is referenced by another array in this way.  Use the resize function 

You can disable that safety-check by setting refcheck=False but that shouldn't be done lightly because you make yourself vulnerable for segmentation faults and memory corruption in case the other reference tries to access the removed elements! This refcheck argument should be treated as an expert-only option!

Summary

Creating a view is really fast and doesn't take much additional memory, so whenever possible you should try to work as much with views as possible. However depending on the use-cases it's not so easy to remove arbitrary elements using basic slicing. While it's easy to remove the first n elements and/or last n elements or remove every x element (the step argument for slicing) this is all you can do with it.

But in your case of removing the last element of a one-dimensional array I would recommend:

arr[:-1]          # if you want a view arr[:-1].copy()   # if you want a new array 

because these most clearly express the intent and everyone with Python/NumPy experience will recognize that.

Timings

Based on the timing framework from this answer:

# Setup import numpy as np  def view(arr):     return arr[:-1]  def array_copy_view(arr):     return arr[:-1].copy()  def array_int_index(arr):     return arr[np.arange(arr.size - 1)]  def array_bool_index(arr):     if not arr.size:         raise IndexError('cannot remove last element of empty array')     keep = np.ones(arr.shape, dtype=bool)     keep[-1] = False     return arr[keep]  def array_delete(arr):     return np.delete(arr, -1)  def array_resize(arr):     return np.resize(arr, arr.size - 1)  # Timing setup timings = {view: [],             array_copy_view: [], array_int_index: [], array_bool_index: [],             array_delete: [], array_resize: []} sizes = [2**i for i in range(1, 20, 2)]  # Timing for size in sizes:     print(size)     func_input = np.random.random(size=size)     for func in timings:         print(func.__name__.ljust(20), ' ', end='')         res = %timeit -o func(func_input)   # if you use IPython, otherwise use the "timeit" module         timings[func].append(res)  # Plotting %matplotlib notebook  import matplotlib.pyplot as plt import numpy as np  fig = plt.figure(1) ax = plt.subplot(111)  for func in timings:     ax.plot(sizes,              [time.best for time in timings[func]],              label=func.__name__) ax.set_xscale('log') ax.set_yscale('log') ax.set_xlabel('size') ax.set_ylabel('time [seconds]') ax.grid(which='both') ax.legend() plt.tight_layout() 

I get the following timings as log-log plot to cover all the details, lower time still means faster, but the range between two ticks represents one order of magnitude instead of a fixed amount. In case you're interested in the specific values, I copied them into this gist:

enter image description here

According to these timings those two approaches are also the fastest. (Python 3.6 and NumPy 1.14.0)

like image 189
MSeifert Avatar answered Oct 04 '22 14:10

MSeifert