Find two pairs of pairs that sum to the same value

Tags:

I have random 2d arrays which I make using

import numpy as np
from itertools import combinations
n = 50
A = np.random.randint(2, size=(n,n))

I would like to determine if the matrix has two pairs of pairs of rows which sum to the same row vector. I am looking for a fast method to do this. My current method just tries all possibilities.

for pair in  combinations(combinations(range(n), 2), 2):
    if (np.array_equal(A[pair[0][0]] + A[pair[0][1]], A[pair[1][0]] + A[pair[1][1]] )):
        print "Pair found", pair

A method that worked for n = 100 would be really great.

429

asked Jan 14 '14 21:01

marshall

2 Answers

Here is a pure numpy solution; no extensive timings, but I have to push n up to 500 before I can see my cursor blink once before it completes. it is memory intensive though, and will fail due to memory requirements for much larger n. Either way, I get the intuition that the odds of finding such a vector decrease geometrically for larger n anyway.

import numpy as np

n = 100
A = np.random.randint(2, size=(n,n)).astype(np.int8)

def base3(a):
    """
    pack the last axis of an array in base 3
    40 base 3 numbers per uint64
    """
    S = np.array_split(a, a.shape[-1]//40+1, axis=-1)
    R = np.zeros(shape=a.shape[:-1]+(len(S),), dtype = np.uint64)
    for i in xrange(len(S)):
        s = S[i]
        r = R[...,i]
        for j in xrange(s.shape[-1]):
            r *= 3
            r += s[...,j]
    return R

def unique_count(a):
    """returns counts of unique elements"""
    unique, inverse = np.unique(a, return_inverse=True)
    count = np.zeros(len(unique), np.int)
    np.add.at(count, inverse, 1)
    return unique, count

def voidview(arr):
    """view the last axis of an array as a void object. can be used as a faster form of lexsort"""
    return np.ascontiguousarray(arr).view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1]))).reshape(arr.shape[:-1])

def has_pairs_of_pairs(A):
    #optional; convert rows to base 3
    A = base3(A)
    #precompute sums over a lower triangular set of all combinations
    rowsums = sum(A[I] for I in np.tril_indices(n,-1))
    #count the number of times each row occurs by sorting
    #note that this is not quite O(n log n), since the cost of handling each row is also a function of n
    unique, count = unique_count(voidview(rowsums))
    #print if any pairs of pairs exist;
    #computing their indices is left as an excercise for the reader
    return np.any(count>1)

from time import clock
t = clock()
for i in xrange(100):
    print has_pairs_of_pairs(A)
print clock()-t

Edit: included base-3 packing; now n=2000 is feasible, taking about 2gb of mem, and a few seconds of processing

Edit: added some timings; n=100 takes only 5ms per call on my i7m.

157

answered Oct 01 '22 21:10

Eelco Hoogendoorn

Based on the code in your question, and on the assumption that you're actually looking for pairs of pairs of rows that sum to equal the same row vector, you could do something like this:

def findMatchSets(A):
   B = A.transpose()
   pairs = tuple(combinations(range(len(A[0])), 2))
   matchSets = [[i for i in pairs if B[0][i[0]] + B[0][i[1]] == z] for z in range(3)]
   for c in range(1, len(A[0])):
      matchSets = [[i for i in block if B[c][i[0]] + B[c][i[1]] == z] for z in range(3) for block in matchSets]
      matchSets = [block for block in matchSets if len(block) > 1]
      if not matchSets:
         return []
   return matchSets

This basically stratifies the matrix into equivalence sets that sum to the same value after one column has been taken into account, then two columns, then three, and so on, until it either reaches the last column or there is no equivalence set left with more than one member (i.e. there is no such pair of pairs). This will work fine for 100x100 arrays, largely because the chances of two pairs of rows summing to the same row vector are infinitesimally small when n is large (n*(n+1)/2 combinations compared to 3^n possible vector sums).

UPDATE

Updated code to allow searching for pairs of n-size subsets of all rows, as requested. Default is n=2 as per the original question:

def findMatchSets(A, n=2):
   B = A.transpose()
   pairs = tuple(combinations(range(len(A[0])), n))
   matchSets = [[i for i in pairs if sum([B[0][i[j]] for j in range(n)]) == z] for z in range(n + 1)]
   for c in range(1, len(A[0])):
      matchSets = [[i for i in block if sum([B[c][i[j]] for j in range(n)]) == z] for z in range(n + 1) for block in matchSets]
      matchSets = [block for block in matchSets if len(block) > 1]
      if not matchSets:
      return []
   return matchSets

answered Oct 01 '22 22:10

richsilv

Related questions
                            
                                Get progress from async python celery chain by chain id
                            
                                How to store application settings across modules [duplicate]
                            
                                Efficient extraction of a subgraph according to some edge attribute in NetworkX
                            
                                flask jinja2 href not linking correctly
                            
                                Imported modules become None when running a function
                            
                                Periodogram in Octave/Matlab vs Scipy
                            
                                Summing across rows of Pandas Dataframe
                            
                                writing back into the same file after reading from the file
                            
                                POST request with Multipart/form-data. Content-type not correct
                            
                                How to match a paragraph using regex
                            
                                How to pass complex objects across view functions/sessions in Flask
                            
                                Python: Find first non-matching character
                            
                                How to make a serial port sniffer sniffing physical port using a python
                            
                                Python argparse parse_args into global namespace (or a reason this is a bad idea)
                            
                                What's the difference between the mro method and the __mro__ attribute of a class?
                            
                                Pandas read_stata() with large .dta files
                            
                                Writing a csv temporary file using tempfile
                            
                                Determine if __getattr__ is method or attribute call
                            
                                Can not the computed centroid values to be plotted over the existing plot based on data
                            
                                Why \g<0> behaves differently than \0 in re.sub?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Find two pairs of pairs that sum to the same value

Tags:

performance

python

algorithm

numpy

marshall

People also ask

2 Answers

Eelco Hoogendoorn

richsilv

Recent Activity

Donate For Us