Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy Finding multiple occurrence in an array by index

Given the following array:

array = [-1, -1, -1, -1, -1, -1, 3, 3, -1, 3, -1, -1,  2,  2, -1, -1,  1, -1]
 indexes  0   1   2   3   4   5  6  7   8  9  10  11  12  13  14  15  16  17

I need to find the indexes where the same number appears. In this example this would return a list of lists like this:

list(list(), list(16), list(12, 13), list(6, 7, 9), list() etc...)
     0       1     \   2             3              4
     ^              \ 
      \              \ the index in the array at which "1" appears
       \ 
        \ the numbers in the array

how would one do this in numpy?

the number 1 appears at index 16
the number 2 appears at indexes 12, 13
etc.

NOTES based on comments:

  • -1 can be ignored, i'm only interested in the rest

  • array has ~50 elements with values up to int(500)

  • this function will be called 6000+ times.

like image 875
Andrei M. Avatar asked Sep 17 '25 12:09

Andrei M.


1 Answers

For a solution in O(n) time*, use a dictionary to collect the indices:

# collect positions per value for each item
d = {}
for i, x in enumerate(array):
    d.setdefault(x, []).append(i)

# sort the output (optional)
out = {k: d[k] for k in sorted(d)}

Output:

{-1: [0, 1, 2, 3, 4, 5, 8, 10, 11, 14, 15, 17],
  1: [16],
  2: [12, 13],
  3: [6, 7, 9]}

* + O(k*log(k)) where k is the number of unique values if you need a sorted output

For a list of lists:

out = [d.get(k, []) for k in range(min(d), max(d)+1)]

# or for only positive values
out = [d.get(k, []) for k in range(1, max(d)+1)]

Output:

[[0, 1, 2, 3, 4, 5, 8, 10, 11, 14, 15, 17],
 [],
 [16],
 [12, 13],
 [6, 7, 9]]

alternative

Or a very simple approach if you pre initialize the output:

out = [[] for i in range(max(array)+1)]
for i, x in enumerate(array):
    out[x].append(i)

comparison of all approaches

pure python is the fastest here.

The initial array was generated using np.random.randint(0, k, size=n).tolist() where n is the length of the array and k the maximum value in the array.

With k=4:

enter image description here

With k=100:

We can now see the quadratic behavior of @TalhaTayyab/@Stitt/@PaulS approaches.

enter image description here

With k=10_000:

We can note that numpy is a bit faster for relatively small arrays (when the values are likely unique).

enter image description here

like image 56
mozway Avatar answered Sep 19 '25 03:09

mozway