Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find two pairs of pairs that sum to the same value

I have random 2d arrays which I make using

import numpy as np
from itertools import combinations
n = 50
A = np.random.randint(2, size=(n,n))

I would like to determine if the matrix has two pairs of pairs of rows which sum to the same row vector. I am looking for a fast method to do this. My current method just tries all possibilities.

for pair in  combinations(combinations(range(n), 2), 2):
    if (np.array_equal(A[pair[0][0]] + A[pair[0][1]], A[pair[1][0]] + A[pair[1][1]] )):
        print "Pair found", pair

A method that worked for n = 100 would be really great.

like image 429
marshall Avatar asked Jan 14 '14 21:01

marshall


People also ask

How do you find all pairs of elements in an array whose sum is equal to given number?

Follow the steps below to solve the given problem: Initialize the count variable with 0 which stores the result. Iterate arr and if the sum of ith and jth [i + 1…..n – 1] element is equal to sum i.e. arr[i] + arr[j] == sum, then increment the count variable. Return the count.

How do you find pairs of the same number in an array?

Simple Approach: Sort the given array so that all the equal elements are adjacent to each other. Now, traverse the array and for every element, if it is equal to the element next to it then it is a valid pair and skips these two elements.

How do you find all pairs of an integer array whose sum is equal to a given number JS?

Write a JavaScript program to find a pair of elements (indices of the two numbers) from an given array whose sum equals a specific target number. ES6 Version: function twoSum(nums, target_num) { const map = []; const indexnum = []; for (let x = 0; x < nums. length; x++) { if (map[nums[x]] !=


2 Answers

Here is a pure numpy solution; no extensive timings, but I have to push n up to 500 before I can see my cursor blink once before it completes. it is memory intensive though, and will fail due to memory requirements for much larger n. Either way, I get the intuition that the odds of finding such a vector decrease geometrically for larger n anyway.

import numpy as np

n = 100
A = np.random.randint(2, size=(n,n)).astype(np.int8)

def base3(a):
    """
    pack the last axis of an array in base 3
    40 base 3 numbers per uint64
    """
    S = np.array_split(a, a.shape[-1]//40+1, axis=-1)
    R = np.zeros(shape=a.shape[:-1]+(len(S),), dtype = np.uint64)
    for i in xrange(len(S)):
        s = S[i]
        r = R[...,i]
        for j in xrange(s.shape[-1]):
            r *= 3
            r += s[...,j]
    return R

def unique_count(a):
    """returns counts of unique elements"""
    unique, inverse = np.unique(a, return_inverse=True)
    count = np.zeros(len(unique), np.int)
    np.add.at(count, inverse, 1)
    return unique, count

def voidview(arr):
    """view the last axis of an array as a void object. can be used as a faster form of lexsort"""
    return np.ascontiguousarray(arr).view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1]))).reshape(arr.shape[:-1])

def has_pairs_of_pairs(A):
    #optional; convert rows to base 3
    A = base3(A)
    #precompute sums over a lower triangular set of all combinations
    rowsums = sum(A[I] for I in np.tril_indices(n,-1))
    #count the number of times each row occurs by sorting
    #note that this is not quite O(n log n), since the cost of handling each row is also a function of n
    unique, count = unique_count(voidview(rowsums))
    #print if any pairs of pairs exist;
    #computing their indices is left as an excercise for the reader
    return np.any(count>1)

from time import clock
t = clock()
for i in xrange(100):
    print has_pairs_of_pairs(A)
print clock()-t

Edit: included base-3 packing; now n=2000 is feasible, taking about 2gb of mem, and a few seconds of processing

Edit: added some timings; n=100 takes only 5ms per call on my i7m.

like image 157
Eelco Hoogendoorn Avatar answered Oct 01 '22 21:10

Eelco Hoogendoorn


Based on the code in your question, and on the assumption that you're actually looking for pairs of pairs of rows that sum to equal the same row vector, you could do something like this:

def findMatchSets(A):
   B = A.transpose()
   pairs = tuple(combinations(range(len(A[0])), 2))
   matchSets = [[i for i in pairs if B[0][i[0]] + B[0][i[1]] == z] for z in range(3)]
   for c in range(1, len(A[0])):
      matchSets = [[i for i in block if B[c][i[0]] + B[c][i[1]] == z] for z in range(3) for block in matchSets]
      matchSets = [block for block in matchSets if len(block) > 1]
      if not matchSets:
         return []
   return matchSets

This basically stratifies the matrix into equivalence sets that sum to the same value after one column has been taken into account, then two columns, then three, and so on, until it either reaches the last column or there is no equivalence set left with more than one member (i.e. there is no such pair of pairs). This will work fine for 100x100 arrays, largely because the chances of two pairs of rows summing to the same row vector are infinitesimally small when n is large (n*(n+1)/2 combinations compared to 3^n possible vector sums).

UPDATE

Updated code to allow searching for pairs of n-size subsets of all rows, as requested. Default is n=2 as per the original question:

def findMatchSets(A, n=2):
   B = A.transpose()
   pairs = tuple(combinations(range(len(A[0])), n))
   matchSets = [[i for i in pairs if sum([B[0][i[j]] for j in range(n)]) == z] for z in range(n + 1)]
   for c in range(1, len(A[0])):
      matchSets = [[i for i in block if sum([B[c][i[j]] for j in range(n)]) == z] for z in range(n + 1) for block in matchSets]
      matchSets = [block for block in matchSets if len(block) > 1]
      if not matchSets:
      return []
   return matchSets
like image 38
richsilv Avatar answered Oct 01 '22 22:10

richsilv