Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy reductions over successive non-contiguous slices

Say I have two numpy arrays, A of shape (d, f) and I of shape (d,) containing indices in 0..n, e.g.

I = np.array([0, 0, 1, 0, 2, 1])
A = np.arange(12).reshape(6, 2)

I am looking for a fast way to make reductions, in particular sum, mean and max, over all the slices A[I == i, :]; a slow version would be

results = np.zeros((I.max() + 1, A.shape[1]))
for i in np.unique(I):
    results[i, :] = np.mean(A[I == i, :], axis=0)

which gives in this case

results = [[ 2.66666667,  3.66666667],
           [ 7.        ,  8.        ],
           [ 8.        ,  9.        ]])

EDIT: I did some timings based on Divakar's answer and a previous poster's (deleted) pandas-based answer.

Timing code:

from __future__ import division, print_function
import numpy as np, pandas as pd
from time import time

np.random.seed(0)
d = 500000
f = 500
n = 500
I = np.hstack((np.arange(n), np.random.randint(n, size=(d - n,))))
np.random.shuffle(I)
A = np.random.rand(d, f)

def reduce_naive(A, I, op="avg"):
    target_dtype = (np.float if op=="avg" else A.dtype)
    results = np.zeros((I.max() + 1, A.shape[1]), dtype=target_dtype)
    npop = {"avg": np.mean, "sum": np.sum, "max": np.max}.get(op)
    for i in np.unique(I):
        results[i, :] = npop(A[I == i, :], axis=0)
    return results

def reduce_reduceat(A, I, op="avg"):
    sidx = I.argsort()
    sI = I[sidx]
    sortedA = A[sidx]
    idx = np.r_[ 0, np.flatnonzero(sI[1:] != sI[:-1])+1 ]
    if op == "max":
        return np.maximum.reduceat(sortedA, idx, axis=0)
    sums = np.add.reduceat(sortedA, idx, axis=0)
    if op == "sum":
        return sums
    if op == "avg":
        count = np.r_[idx[1:] - idx[:-1], A.shape[0] - idx[-1]]
        return sums/count.astype(float)[:,None]

def reduce_bincount(A, I, op="avg"):
    ids = (I[:,None] + (I.max()+1)*np.arange(A.shape[1])).ravel()
    sums = np.bincount(ids, A.ravel()).reshape(A.shape[1],-1).T
    if op == "sum":
        return sums
    if op == "avg":
        return sums/np.bincount(ids).reshape(A.shape[1],-1).T

def reduce_pandas(A, I, op="avg"):
    group = pd.concat([pd.DataFrame(A), pd.DataFrame(I, columns=("i",))
                     ], axis=1
                    ).groupby('i')
    if op == "sum":
        return group.sum().values
    if op == "avg":
        return group.mean().values
    if op == "max":
        return group.max().values

def reduce_hybrid(A, I, op="avg"):
    sidx = I.argsort()
    sI = I[sidx]
    sortedA = A[sidx]

    idx = np.r_[ 0, np.flatnonzero(sI[1:] != sI[:-1])+1 ]
    unq_sI = sI[idx]    

    m = I.max()+1
    N = A.shape[1]

    target_dtype = (np.float if op=="avg" else A.dtype)
    out = np.zeros((m,N),dtype=target_dtype)
    ss_idx = np.r_[idx,A.shape[0]]

    npop = {"avg": np.mean, "sum": np.sum, "max": np.max}.get(op)
    for i in range(len(idx)):
        out[unq_sI[i]] = npop(sortedA[ss_idx[i]:ss_idx[i+1]], axis=0)
    return out

for op in ("sum", "avg", "max"):
    for name, method in (("naive   ", reduce_naive), 
                         ("reduceat", reduce_reduceat),
                         ("pandas  ", reduce_pandas),
                         ("bincount", reduce_bincount),
                         ("hybrid  ", reduce_hybrid)
                         ("numba   ", reduce_numba)
                        ):    
        if op == "max" and name == "bincount":
            continue
        # if name is not "naive":
        #      assert np.allclose(method(A, I, op), reduce_naive(A, I, op))
        times = []
        for tries in range(3):
            time0 = time(); method(A, I, op)
            times.append(time() - time0); 
        print(name, op, "{:.2f}".format(np.min(times)))
    print()

Timings:

naive    sum 1.10
reduceat sum 4.62
pandas   sum 5.29
bincount sum 1.54
hybrid   sum 0.62
numba    sum 0.31

naive    avg 1.12
reduceat avg 4.45
pandas   avg 5.23
bincount avg 2.43
hybrid   avg 0.61
numba    avg 0.33

naive    max 1.19
reduceat max 3.18
pandas   max 5.24
hybrid   max 0.72
numba    max 0.34

(I chose d and n as typical values for my use case - I have added the code for numba-versions in my answer).

like image 331
Maxim Avatar asked Mar 17 '17 16:03

Maxim


People also ask

Does numpy use non contiguous memory?

A NumPy array is basically described by metadata (notably the number of dimensions, the shape, and the data type) and the actual data. The data is stored in a homogeneous and contiguous block of memory, at a particular address in system memory (Random Access Memory, or RAM).

Are numpy arrays contiguous?

ascontiguousarray() function is used to return a contiguous array where the dimension of the array is greater or equal to 1 and stored in memory (C order). Note: A contiguous array is stored in an unbroken block of memory. To access the subsequent value in the array, we move to the next memory address.

What is fancy indexing in Python?

Fancy indexing is conceptually simple: it means passing an array of indices to access multiple array elements at once. For example, consider the following array: import numpy as np rand = np. random. RandomState(42) x = rand.

What is negative indexing in numpy array?

Negative indices are interpreted as counting from the end of the array (i.e., if i < 0, it means n_i + i). All arrays generated by basic slicing are always views of the original array. The standard rules of sequence slicing apply to basic slicing on a per-dimension basis (including using a step index).


2 Answers

Approach #1 : Using NumPy ufunc reduceat

We have ufuncs for those three reduction operations and luckily we also have ufunc.reduceat to perform those reductions limited at particular intervals along an axis. So, using those, we would have those three operations computed like so -

# Gives us sorted array based on input indices I and indices at which the
# sorted array should be interval-limited for reduceat operations to be
# applied later on using those results
def sorted_array_intervals(A, I):
    # Compute sort indices for I. To be later used for sorting A based on it.
    sidx = I.argsort()
    sI = I[sidx]
    sortedA = A[sidx]

    # Get indices at which intervals change. Also, get count in each interval
    idx = np.r_[ 0, np.flatnonzero(sI[1:] != sI[:-1])+1 ]
    return sortedA, idx

# Groupby sum reduction using the interval indices 
# to perform interval-limited ufunc reductions
def groupby_sum(A, I):
    sortedA, idx = sorted_array_intervals(A,I)
    return np.add.reduceat(sortedA, idx, axis=0)

# Groupby mean reduction
def groupby_mean(A, I):
    sortedA, idx = sorted_array_intervals(A,I)
    sums = np.add.reduceat(sortedA, idx, axis=0)
    count = np.r_[idx[1:] - idx[:-1], A.shape[0] - idx[-1]]
    return sums/count.astype(float)[:,None]

# Groupby max reduction
def groupby_max(A, I):
    sortedA, idx = sorted_array_intervals(A,I)
    return np.maximum.reduceat(sortedA, idx, axis=0)

Thus, if we need all those operations, we could reuse with one instance of sorted_array_intervals, like so -

def groupby_sum_mean_max(A, I):
    sortedA, idx = sorted_array_intervals(A,I)
    sums = np.add.reduceat(sortedA, idx, axis=0)
    count = np.r_[idx[1:] - idx[:-1], A.shape[0] - idx[-1]]
    avgs = sums/count.astype(float)[:,None]
    maxs = np.maximum.reduceat(sortedA, idx, axis=0)
    return sums, avgs, maxs

Approach #1-B : Hybrid version (sort + slice + reduce)

Here's a hybrid version that does takes help from sorted_array_intervals to get the sorted array and the indices at which the intervals change into the next group, but at the last stage uses slicing to sum each interval and does this iteratively for each of the groups. The slicing helps here as we are working with views.

The implementation would look something like this -

def reduce_hybrid(A, I, op="avg"):
    sidx = I.argsort()
    sI = I[sidx]
    sortedA = A[sidx]

    # Get indices at which intervals change. Also, get count in each interval
    idx = np.r_[ 0, np.flatnonzero(sI[1:] != sI[:-1])+1 ]
    unq_sI = sI[idx]    

    m = I.max()+1
    N = A.shape[1]

    target_dtype = (np.float if op=="avg" else A.dtype)
    out = np.zeros((m,N),dtype=target_dtype)
    ss_idx = np.r_[idx,A.shape[0]]

    npop = {"avg": np.mean, "sum": np.sum, "max": np.max}.get(op)
    for i in range(len(idx)):
        out[unq_sI[i]] = npop(sortedA[ss_idx[i]:ss_idx[i+1]], axis=0)
    return out

Runtime test (Using the setup from the benchmarks posted in the question) -

In [432]: d = 500000
     ...: f = 500
     ...: n = 500
     ...: I = np.hstack((np.arange(n), np.random.randint(n, size=(d - n,))))
     ...: np.random.shuffle(I)
     ...: A = np.random.rand(d, f)
     ...: 

In [433]: %timeit reduce_naive(A, I, op="sum")
     ...: %timeit reduce_hybrid(A, I, op="sum")
     ...: 
1 loops, best of 3: 1.03 s per loop
1 loops, best of 3: 549 ms per loop

In [434]: %timeit reduce_naive(A, I, op="avg")
     ...: %timeit reduce_hybrid(A, I, op="avg")
     ...: 
1 loops, best of 3: 1.04 s per loop
1 loops, best of 3: 550 ms per loop

In [435]: %timeit reduce_naive(A, I, op="max")
     ...: %timeit reduce_hybrid(A, I, op="max")
     ...: 
1 loops, best of 3: 1.14 s per loop
1 loops, best of 3: 631 ms per loop

Approach #2 : Using NumPy bincount

Here's another approach using np.bincount that does bin-based summing. So, with it, we could compute the sum and average values and also avoid sorting in the process, like so -

ids = (I[:,None] + (I.max()+1)*np.arange(A.shape[1])).ravel()
sums = np.bincount(ids, A.ravel()).reshape(A.shape[1],-1).T
avgs = sums/np.bincount(ids).reshape(A.shape[1],-1).T
like image 172
Divakar Avatar answered Nov 05 '22 06:11

Divakar


Using the python/numpy jit compiler Numba I was able to get shorter timings with just-in-time compilation of the intuitive linear algorithm:

from numba import jit

@jit
def reducenb_avg(A, I):
    d, f = A.shape
    n = I.max() + 1
    result = np.zeros((n, f), np.float)
    count = np.zeros((n, 1), int)
    for i in range(d):
        result[I[i], :] += A[i]
        count[I[i], 0] += 1
    return result/count

@jit
def reducenb_sum(A, I):
    d, f = A.shape
    n = I.max() + 1
    result = np.zeros((n, f), A.dtype)
    for i in range(d):
        result[I[i], :] += A[i]
    return result

@jit
def reducenb_max(A, I):
    d, f = A.shape
    n = I.max() + 1
    result = -np.inf * np.ones((n, f))
    count = np.zeros((n, f))
    for i in range(d):
        result[I[i], :] = np.maximum(A[i], result[I[i], :])
    return result

def reduce_numba(A, I, op="avg"):
    return {"sum": reducenb_sum, "avg": reducenb_avg, "max": reducenb_max}.get(op)(A, I)

On the benchmark problem, these finish in ~0.32s, about half of the time of pure numpy sorting-based methods.

like image 31
Maxim Avatar answered Nov 05 '22 07:11

Maxim