Vectorized assignment in Numpy

Tags:

Let's assume I have a large 2D numpy array, e.g. 1000x1000 elements. I also have two 1D integer arrays of length L, and a float 1D arrray of the same length. If I want to simply assign floats to different positions in the original array according to integer array, I could write:

mat = np.zeros((1000,1000))
int1 = np.random.randint(0,999,size=(50000,))
int2 = np.random.randint(0,999,size=(50000,))
f = np.random.rand(50000)
mat[int1,int2] = f

But if there were collisions i.e. multiple floats corresponding to single location, all but the last would be overwritten. Is there a way to somehow aggregate all the collisions, e.g. mean or median of all the floats falling at the same location? I would like to take advantage of vectorization and hopefully avoid interpreter loops.

Thanks!

832

asked Jun 29 '18 00:06

Cindy Almighty

Video Answer

2 Answers

Building on hpaulj's suggestion, here's how to get the mean value in case of collisions:

import numpy as np

mat = np.zeros((2,2))
int1 = np.zeros(2, dtype=int)
int2 = np.zeros(2, dtype=int)
f = np.array([0,1])

np.add.at(mat, [int1, int2], f)
n = np.zeros((2,2))
np.add.at(n, [int1, int2], 1)
mat[int1, int2] /= n[int1, int2]
print(mat)

array([[0.5, 0. ],
       [0. , 0. ]])

119

answered Sep 26 '22 04:09

Julien

You can manipulate your data in pandas and then assign.

Starting from

mat = np.zeros((1000,1000))
a = np.random.randint(0,999,size=(50000,))
b = np.random.randint(0,999,size=(50000,))
c = np.random.rand(50000)

You can define a function

def get_aggregated_collisions(a,b,c):
    df = pd.DataFrame({'x':a, 'y':b, 'v':c})
    df['coord'] = df[['x','y']].apply(tuple,1)
    d = df.groupby('coord').agg({"v":'mean','x':'first', 'y':'first'}).to_dict('list')
    return d

and then

d = get_aggregated_collisions(a,b,c)
mat[d['x'], d['y']] = d['v']

The whole operation (including generating the matrixes, np.random etc) ran quite ok

1.05 s ± 30.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The idea behind making a tuple of coordinates was to have a hashable option to group values by their coordinates. Maybe there is even a smarter way to do this :) always open to suggestions.

answered Sep 23 '22 04:09

rafaelc

Related questions
                            
                                Best practices for writing argparse parsers
                            
                                What does "splitter" attribute in sklearn's DecisionTreeClassifier do?
                            
                                Python PostgreSQL COPY command used to INSERT or UPDATE (not just INSERT)
                            
                                Matplotlib mathtext: Glyph errors in tick labels
                            
                                File association not found for extension .py
                            
                                matplotlib toolbar in a pyqt5 application
                            
                                Running collectstatic on server : AttributeError: 'PosixPath' object has no attribute 'startswith'
                            
                                Can you stop PyCharm from automatically closing script files when you click out of the program?
                            
                                Pearson correlation and nan values
                            
                                Django max similarity (TrigramSimilarity) from ManyToManyField
                            
                                pandas plotting - x axis gets transformed to floats
                            
                                How does await give back control to the event loop during coroutine chaining?
                            
                                Python pandas: concat vertical and horizontal
                            
                                Manager / Container class, how to?
                            
                                Selenium with chromedriver doesn't start via cron
                            
                                Difference between setRootPath and setRootIndex in QFileSystemModel
                            
                                How can I attach documentation to members of a python enum?
                            
                                Shopify API Python Multiple Pictures upload with Python API
                            
                                python: How to trace function execution order in large project
                            
                                Is there an alternative to `difflib.get_close_matches()` that returns indexes (list positions) instead of a str list?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Vectorized assignment in Numpy

Tags:

python

arrays

numpy