Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python pairwise comparison of elements in a array or list

Let me elaborate my question using a simple example.I have a=[a1,a2,a3,a4], with all ai being a numerical value.

What I want to get is pairwise comparisons within 'a', such as I(a1>=a2), I(a1>=a3), I(a1>=a4), ,,,,I(a4>=a1), I(a4>=a2), I(a4>=a3), where I is a indicator function. So I used the following code.

res=[x>=y for x in a for y in a]

But it also gives the comparison results like I(a1>=a1),..,I(a4>=a4), which is always one. To get rid of these nuisance, I convert res into a numpy array and find the off diagonal elements.

res1=numpy.array(res)

This gives the result what I want, but I think there should be more efficient or simpler way to do pairwise comparison and extract the off diagonal element. Do you have any idea about this? Thanks in advance.

like image 371
Sue Avatar asked Sep 13 '16 21:09

Sue


People also ask

How do you conduct pairwise comparisons?

Pairwise Comparison Steps:Compute a mean difference for each pair of variables. Find the critical mean difference. Compare each calculated mean difference to the critical mean. Decide whether to retain or reject the null hypothesis for that pair of means.

How do you know how many pairwise comparisons?

The formula for the number of independent pairwise comparisons is k(k-1)/2, where k is the number of conditions. If we had three conditions, this would work out as 3(3-1)/2 = 3, and these pairwise comparisons would be Gap 1 vs.

What do pairwise comparisons show?

Pairwise comparisons are methods for analyzing multiple population means in pairs to determine whether they are significantly different from one another.

Why is pairwise comparison important?

Paired Comparison Analysis (also known as Pairwise Comparison) helps you work out the importance of a number of options relative to one another. This makes it easy to choose the most important problem to solve, or to pick the solution that will be most effective.


1 Answers

You could use NumPy broadcasting -

# Get the mask of comparisons in a vectorized manner using broadcasting
mask = a[:,None] >= a

# Select the elements other than diagonal ones
out = mask[~np.eye(a.size,dtype=bool)]

If you rather prefer to set the diagonal elements as False in mask and then mask would be the output, like so -

mask[np.eye(a.size,dtype=bool)] = 0

Sample run -

In [56]: a
Out[56]: array([3, 7, 5, 8])

In [57]: mask = a[:,None] >= a

In [58]: mask
Out[58]: 
array([[ True, False, False, False],
       [ True,  True,  True, False],
       [ True, False,  True, False],
       [ True,  True,  True,  True]], dtype=bool)

In [59]: mask[~np.eye(a.size,dtype=bool)] # Selecting non-diag elems
Out[59]: 
array([False, False, False,  True,  True, False,  True, False, False,
        True,  True,  True], dtype=bool)

In [60]: mask[np.eye(a.size,dtype=bool)] = 0 # Setting diag elems as False

In [61]: mask
Out[61]: 
array([[False, False, False, False],
       [ True, False,  True, False],
       [ True, False, False, False],
       [ True,  True,  True, False]], dtype=bool)

Runtime test

Reasons to use NumPy broadcasting? Performance! Let's see how with a large dataset -

In [34]: def pairwise_comp(A): # Using NumPy broadcasting    
    ...:     a = np.asarray(A) # Convert to array if not already so
    ...:     mask = a[:,None] >= a
    ...:     out = mask[~np.eye(a.size,dtype=bool)]
    ...:     return out
    ...: 

In [35]: a = np.random.randint(0,9,(1000)).tolist() # Input list

In [36]: %timeit [x >= y for i,x in enumerate(a) for j,y in enumerate(a) if i != j]
1 loop, best of 3: 185 ms per loop # @Sixhobbits's loopy soln

In [37]: %timeit pairwise_comp(a)
100 loops, best of 3: 5.76 ms per loop
like image 108
Divakar Avatar answered Sep 20 '22 15:09

Divakar