Given the following array:
array = [-1, -1, -1, -1, -1, -1, 3, 3, -1, 3, -1, -1, 2, 2, -1, -1, 1, -1]
indexes 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
I need to find the indexes where the same number appears. In this example this would return a list of lists like this:
list(list(), list(16), list(12, 13), list(6, 7, 9), list() etc...)
0 1 \ 2 3 4
^ \
\ \ the index in the array at which "1" appears
\
\ the numbers in the array
how would one do this in numpy?
the number 1 appears at index 16
the number 2 appears at indexes 12, 13
etc.
NOTES based on comments:
-1 can be ignored, i'm only interested in the rest
array has ~50 elements with values up to int(500)
this function will be called 6000+ times.
For a solution in O(n)
time*, use a dictionary to collect the indices:
# collect positions per value for each item
d = {}
for i, x in enumerate(array):
d.setdefault(x, []).append(i)
# sort the output (optional)
out = {k: d[k] for k in sorted(d)}
Output:
{-1: [0, 1, 2, 3, 4, 5, 8, 10, 11, 14, 15, 17],
1: [16],
2: [12, 13],
3: [6, 7, 9]}
* + O(k*log(k))
where k
is the number of unique values if you need a sorted output
For a list of lists:
out = [d.get(k, []) for k in range(min(d), max(d)+1)]
# or for only positive values
out = [d.get(k, []) for k in range(1, max(d)+1)]
Output:
[[0, 1, 2, 3, 4, 5, 8, 10, 11, 14, 15, 17],
[],
[16],
[12, 13],
[6, 7, 9]]
Or a very simple approach if you pre initialize the output:
out = [[] for i in range(max(array)+1)]
for i, x in enumerate(array):
out[x].append(i)
pure python is the fastest here.
The initial array was generated using np.random.randint(0, k, size=n).tolist()
where n
is the length of the array and k
the maximum value in the array.
With k=4
:
With k=100
:
We can now see the quadratic behavior of @TalhaTayyab/@Stitt/@PaulS approaches.
With k=10_000
:
We can note that numpy is a bit faster for relatively small arrays (when the values are likely unique).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With