Representing a ragged array in numpy by padding

Tags:

I have a 1-dimensional numpy array scores of scores associated with some objects. These objects belong to some disjoint groups, and all the scores of the items in the first group are first, followed by the scores of the items in the second group, etc.

I'd like to create a 2-dimensional array where each row corresponds to a group, and each entry is the score of one of its items. If all the groups are of the same size I can just do:

scores.reshape((numGroups, groupSize))

Unfortunately, my groups may be of varying size. I understand that numpy doesn't support ragged arrays, but it is fine for me if the resulting array simply pads each row with a specified value to make all rows the same length.

To make this concrete, suppose I have set A with 3 items, B with 2 items, and C with four items.

scores = numpy.array([f(a[0]), f(a[1]), f(a[2]), f(b[0]), f(b[1]), 
                       f(c[0]), f(c[1]), f(c[2]), f(c[3])])
rowStarts = numpy.array([0, 3, 5])
paddingValue = -1.0
scoresByGroup = groupIntoRows(scores, rowStarts, paddingValue)

The desired value of scoresByGroup would be:

 [[f(a[0]), f(a[1]), f(a[2]), -1.0], 
    [f(b[0]), f(b[1]), -1.0, -1.0]
    [f(c[0]), f(c[1]), f(c[2]), f(c[3])]]

Is there some numpy function or composition of functions I can use to create groupIntoRows?

Background:

This operation will be used in calculating the loss for a minibatch for a gradient descent algorithm in Theano, so that's why I need to keep it as a composition of numpy functions if possible, rather than falling back on native Python.
It's fine to assume there is some known maximum row size
The original objects being scored are vectors and the scoring function is a matrix multiplication, which is why we flatten things out in the first place. It would be possible to pad everything to the maximum item set size before doing the matrix multiplication, but the biggest set is over ten times bigger than the average set size, so this is undesirable for speed reasons.

381

asked May 02 '13 19:05

Ryan Gabbard

1 Answers

Try this:

scores = np.random.rand(9)
row_starts = np.array([0, 3, 5])
row_ends = np.concatenate((row_starts, [len(scores)]))
lens = np.diff(row_ends)
pad_len = np.max(lens) - lens
where_to_pad = np.repeat(row_ends[1:], pad_len)
padding_value = -1.0
padded_scores = np.insert(scores, where_to_pad,
                          padding_value).reshape(-1, np.max(lens))

>>> padded_scores
array([[ 0.05878244,  0.40804443,  0.35640463, -1.        ],
       [ 0.39365072,  0.85313545, -1.        , -1.        ],
       [ 0.133687  ,  0.73651147,  0.98531828,  0.78940163]])

182

answered Sep 27 '22 23:09

Jaime

Related questions
                            
                                Extended ASCII Character Conversion
                            
                                Create single list with multiple instances of objects from second list in Python
                            
                                Jinja2: saying 'Render this macro inside another macro or template'
                            
                                Given a string that's a weekday name, how to figure out weekday as decimal (and the next date it occurs)?
                            
                                Run python script with dot slash ./
                            
                                ImportError: cannot import name Ghost
                            
                                How can I get the offset where a Python regular expression search found a match?
                            
                                pyHook for Python 3.3
                            
                                Sum tuples if identical values
                            
                                Django error: cannot import name current_datetime
                            
                                Redis disconnecting and reconnecting on Twisted periodically
                            
                                Preparing _tkinter and sqlite3 for Python installation (no admin rights)
                            
                                Web scraping with python and sqlite. How to store scraped data effectively?
                            
                                How do I check if a twisted.internet.protocol instance has disconnected
                            
                                Processing some file before starting app and reacting for every change
                            
                                scikit-image save image to a bytestring
                            
                                How can I force python to use dumbdbm module to create a new database?
                            
                                Understanding characters received from Arduino
                            
                                PySNMP : ImportError: No module named pyasn1.compat.octets
                            
                                Speed up random matrix computation

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Representing a ragged array in numpy by padding

Tags:

python

numpy

Ryan Gabbard

People also ask

1 Answers

Jaime

Recent Activity

Donate For Us