Logo Questions Linux Laravel Mysql Ubuntu Git Menu

sum zeros and ones by another vector in python

I have the following data array m:

import numpy as np
a = [[1],[0],[1],[0],[0]]
b = [[1],[0],[1],[0],[0]]
c = d = [[1],[0],[1],[0],[0]]
m = np.hstack((a,b,c,d))
array([[1, 0, 1, 1],
       [0, 0, 0, 0],
       [1, 1, 1, 1],
       [0, 0, 0, 0],
       [0, 1, 0, 0]])

I have the following vector prior

prior = [0.1,0.2,0.3,0.4]

I now want to create a new vector of length 5, where each row of m is summed according to this scheme

if 1 then add 1/prior

if 0 then add 0.1*1/prior

so for the first row in m we would get

(1/0.1)+(0.1*1/0.2)+(1/0.3)+(1/0.4) = 16.33

the second row is

(0.1*1/0.1)+(0.1*1/0.2)+(0.1*1/0.3)+(0.1*1/0.4) = 2.083

m should be the basis and numpy may be used (perhaps .sum(axis=1)) ?


I'm also interested in a solution where m can take more than two different integers. For example I want a third rule for m==2 and add these values 0.2*1/prior

like image 652
spore234 Avatar asked Feb 10 '23 00:02


2 Answers

Since you are already using numpy I would recommend numpy.where and numpy.sum. Note that this works only if you make prior a numpy.array.

p = np.asarray(prior)

# array([ 16.33333333,   2.08333333,  20.83333333,   2.08333333,   6.58333333])


np.where usually expects an array of bools. However, when you give a list of integers the number 0 is interpreted as a False and everything else as a True


If you want to add a third rule for the occurrence of 2 in m I would use np.choose instead of np.where. If you want to have 0.2/p for the occurrence of 2 you can do

p = np.asarray(prior)
p_vec = np.vstack((0.1/p,1./p,0.2/p))

The idea is to create first a list p_vec which contains 0.1/p,1./p and 0.2/p. The command np.choose picks then the corresponding entity out of the list depending on m.

This can easily extended for integers 3,4,... just add the corresponding data to p_vec.

like image 124
plonser Avatar answered Feb 12 '23 12:02


Approach #1: Vectorized approach with boolean indexing -

# Calculate the reciprocal of prior as a numpy array
prior_reci = 1/np.asarray(prior)

# Mask of ones (1s) in array, m
mask = m==1

# Use the mask for m==1 and otherwise with proper scales: prior_reci
# and 0.1*prior_reci respectively and sum them up along the rows
out = (mask*prior_reci + ~mask*(0.1*prior_reci)).sum(1)

Sample run -

In [58]: m
array([[1, 0, 1, 1],
       [0, 0, 0, 0],
       [1, 1, 1, 1],
       [0, 0, 0, 0],
       [0, 1, 0, 0]])

In [59]: prior
Out[59]: [0.1, 0.2, 0.3, 0.4]

In [60]: prior_reci = 1/np.asarray(prior)
    ...: mask = m==1

In [61]: (mask*prior_reci + ~mask*(0.1*prior_reci)).sum(1)
Out[61]: array([ 16.33333333,   2.08333333,  20.83333333,   2.08333333,   6.58333333])

Approach #2: Using matrix-multiplication with np.dot -

# Calculate the reciprocal of prior as a numpy array
prior_reci = 1/np.asarray(prior)

# Sum along rows for m==1 with scaling of prior_reci per row
# would be equivalent to np.dot(m,prior_reci).
# Similarly for m!=1, it would be np.dot(1-m,0.1*prior_reci) 
# i.e. with the new scaling 0.1*prior_reci. 
# Finally we need to combine them up with summation.
out = np.dot(m,prior_reci) + np.dot(1-m,0.1*prior_reci)

Sample run -

In [77]: m
array([[1, 0, 1, 1],
       [0, 0, 0, 0],
       [1, 1, 1, 1],
       [0, 0, 0, 0],
       [0, 1, 0, 0]])

In [78]: prior
Out[78]: [0.1, 0.2, 0.3, 0.4]

In [79]: prior_reci = 1/np.asarray(prior)

In [80]: np.dot(m,prior_reci) + np.dot(1-m,0.1*prior_reci)
Out[80]: array([ 16.33333333,   2.08333333,  20.83333333,   2.08333333,   6.58333333])

Runtime tests to compare the earlier listed two approaches -

In [102]: # Parameters
     ...: H = 1000
     ...: W = 1000
     ...: # Create inputs
     ...: m = np.random.randint(0,2,(H,W))
     ...: prior = np.random.rand(W).tolist()

In [103]: %%timeit
     ...: prior_reci1 = 1/np.asarray(prior)
     ...: mask = m==1
     ...: out1 = (mask*prior_reci1 + ~mask*(0.1*prior_reci1)).sum(1)
100 loops, best of 3: 11.1 ms per loop

In [104]: %%timeit
     ...: prior_reci2 = 1/np.asarray(prior)
     ...: out2 = np.dot(m,prior_reci2) + np.dot(1-m,0.1*prior_reci2)
100 loops, best of 3: 6 ms per loop

Generic solution to handle multiple conditional checks could be solved in a vectorized manner with np.einsum -

# Define scalars that are to be matched against input 2D array, m
matches = [0,1,2,3,4] # Edit this to accomodate more matching conditions

# Define multiplying factors for the reciprocal version of prior
prior_multfactors = [0.1,1,0.2,0.3,0.4] # Edit this corresponding to matches 
                                  # for different multiplying factors

# Thus, for the given matches and prior_multfactors, it means:
# when m==0, then do: 0.1/prior
# when m==1, then do: 1/prior
# when m==2, then do: 0.2/prior
# when m==3, then do: 0.3/prior
# when m==4, then do: 0.4/prior

# Define prior list
prior = [0.1,0.2,0.3,0.4]

# Calculate the reciprocal of prior as a numpy array
prior_reci = 1/np.asarray(prior)

# Mask for every element of m satisfying or not 
# all the matches to produce a 3D array mask
mask = m==np.asarray(matches)[:,None,None]

# Get scaling factors for each matches across each prior_reci value
scales = np.asarray(prior_multfactors)[:,None]*prior_reci

# Einsum-mation to give sum across rows corresponding to all matches
out = np.einsum('ijk,ik->j',mask,scales)

Sample run -

In [203]: m
array([[1, 0, 1, 1],
       [0, 0, 0, 0],
       [4, 2, 3, 1],
       [0, 0, 0, 0],
       [0, 4, 2, 0]])

In [204]: matches, prior_multfactors
Out[204]: ([0, 1, 2, 3, 4], [0.1, 1, 0.2, 0.3, 0.4])

In [205]: prior
Out[205]: [0.1, 0.2, 0.3, 0.4]

In [206]: prior_reci = 1/np.asarray(prior)
     ...: mask = m==np.asarray(matches)[:,None,None]
     ...: scales = np.asarray(prior_multfactors)[:,None]*prior_reci

In [207]: np.einsum('ijk,ik->j',mask,scales)
Out[207]: array([ 16.33333333,   2.08333333,   8.5       ,   2.08333333,   3.91666667])
like image 22
Divakar Avatar answered Feb 12 '23 13:02
