Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Checking if several sets of pairs covers a given set of pairs

Suppose we have N arrays of pairs, e.g. for N=3:

A1 = [ [3,2], [4,1], [5,1], [7,1], [7,2], [7,3] ]

A2 = [ [3,1], [3,2], [4,1], [4,2], [4,3], [5,3], [7,2] ]

A3 = [ [4,1], [5,1], [5,2], [7,1] ]

We can assume that the pairs in every array are sorted by the first number, and then by the second number. Also, the same pair won't appear in the same array more than once (same pair can appear in multiple arrays, as you can see above).

The numbers in every pair are arbitrary integers >= 1.

How could I find all the k's that satisfy:

enter image description here

(In simple words, [k,1], [k,2], ... , [k,N] exist in different arrays)

The expected result for the example above is: [5, 7].

Note: Speed is the most critical factor of the algorithm, then memory. If it helps to optimise for speed/memory, assume that N <= 10. The number of pairs in an array can be ~50000.

like image 311
Misha Moroshko Avatar asked May 09 '15 22:05

Misha Moroshko


1 Answers

A few years ago I wrote a backtracking regular expression engine, and I realized when looking at your problem that the same algorithm could be used here. I'm not sure if it's the most efficient solution possible, but it's one option.

[Update: See end of answer for JavaScript port.] I wrote the code in C++ because it's my favorite language, and it wasn't clear from the original revision of your question which language this was intended for (I started writing the code before your second revision; it appeared to me from revision 1 that your question was language-agnostic), but I'm sure it should be possible to port this from C++ to JavaScript; std::vector would become Array/[] and std::map would become a plain old Object/{}, for starters.

C++

#include <cstdio>
#include <cstdlib>
#include <cassert>
#include <climits>
#include <cerrno>
#include <cctype>

#include <vector>
#include <map>

typedef std::pair<int,int> Pair;
typedef std::vector<Pair> Array;
typedef std::vector<Array> ArrayList;

typedef std::vector<int> Solution;

Solution solve(const ArrayList& arrayList);
Array parseArray(char*& arg, int i, char* argFull );
Pair parsePair(char*& arg, int i, char* argFull );
int parseNumber(char*& arg, int i, char* argFull );
void validateArrayList(const ArrayList& arrayList);
void printArrayList(const ArrayList& arrayList);
void printSolution(const Solution& solution);
void skipWhite(char*& a);

int main(int argc, char *argv[] ) {

    // parse out arrays
    ArrayList arrayList;
    for (int i = 1; i < argc; ++i) {
        char* arg = argv[i];
        arrayList.push_back(parseArray(arg,i,arg));
    } // end for
    validateArrayList(arrayList);
    printArrayList(arrayList);

    // solve
    Solution solution = solve(arrayList);
    printSolution(solution);

} // end main()

Solution solve(const ArrayList& arrayList) {

    typedef std::vector<int> ArrayHaveList; // use the 0-based index for ease
    typedef std::vector<ArrayHaveList> SmallNList;
    typedef std::map<int,SmallNList> KMap;

    // 1: collect all possible k's into a map
    // during this collection, will precompute which arrays have which "small n" values for each k
    KMap kmap;
    for (int ai = 0; ai < arrayList.size(); ++ai) {
        const Array& array = arrayList[ai];
        for (int pi = 0; pi < array.size(); ++pi) {
            const Pair& pair = array[pi];
            std::pair<KMap::iterator,bool> insertRet = kmap.insert(std::pair<int,SmallNList>(pair.first,SmallNList()));
            SmallNList& smallNList = insertRet.first->second;
            if (insertRet.second) smallNList.resize(arrayList.size());
            assert(pair.second >= 1 && pair.second <= arrayList.size()); // should already have been validated
            smallNList[pair.second-1].push_back(ai); // don't have to check because same pair cannot appear in same array more than once (already validated this)
        } // end for
    } // end for
    // debug kmap
    std::printf("----- KMap -----\n");
    for (KMap::iterator it = kmap.begin(); it != kmap.end(); ++it) {
        int k = it->first;
        const SmallNList& smallNList = it->second;
        std::printf("%d: [", k );
        for (int ni = 0; ni < smallNList.size(); ++ni) {
            const ArrayHaveList& arrayHaveList = smallNList[ni];
            if (ni > 0) std::printf(",");
            std::printf("%d:[", ni+1 );
            for (int ci = 0; ci < arrayHaveList.size(); ++ci) {
                int ai = arrayHaveList[ci];
                if (ci > 0) std::printf(",");
                std::printf("%d", ai+1 );
            } // end for
            std::printf("]");
        } // end for
        std::printf("]\n");
    } // end for

    // 2: for each k, and then for each small n, go through each possible "candidate" array that has that small n value for that k, and see if we can satisfy all small n values for that k
    std::printf("----- solving algorithm -----\n");
    Solution solution;
    for (KMap::iterator it = kmap.begin(); it != kmap.end(); ++it) {

        int k = it->first;
        const SmallNList& smallNList = it->second;

        // actually, first do a trivial check to see if any small n values have no candidates at all; then this is impossible, and we can reject this k
        bool impossible = false; // assumption
        for (int ni = 0; ni < smallNList.size(); ++ni) { // ni is "small n index", meaning 0-based
            const ArrayHaveList& arrayHaveList = smallNList[ni];
            if (arrayHaveList.size() == 0) {
                impossible = true;
                break;
            } // end if
        } // end for
        if (impossible) {
            std::printf("trivially impossible: k=%d.\n", k );
            continue;
        } // end if

        // now run the main backtracking candidate selection algorithm
        std::vector<bool> burnList; // allows quickly checking if a candidate array is available for a particular [k,n] pair (indexed by ai)
        std::vector<int> currentCandidateIndex; // required for backtracking candidate selections; keeps track of the current selection from the list of candidates for each [k,n] pair (indexed by ni)
        burnList.assign(arrayList.size(), false );
        currentCandidateIndex.assign(arrayList.size(), -1 ); // initialize to -1, so on first iteration of any given ni it will automatically be incremented to the first index, zero
        int ni; // declared outside for-loop for accessibility afterward
        for (ni = 0; ni < smallNList.size(); ++ni) { // ni is "small n index", meaning 0-based representation of small n

            // very important: this loop is the base for the backtracking algorithm, thus, it will be continued (with modified ni) when we need to backtrack, thus, it needs to be general for these cases

            std::printf("-> k=%d n=%d\n", k, ni+1 );

            const ArrayHaveList& arrayHaveList = smallNList[ni];

            // attempt to find a candidate array that is unburnt
            bool gotOne = false; // assumption
            for (int ci = currentCandidateIndex[ni]+1; ci < arrayHaveList.size(); ++ci) { // start ci at previous best candidate index plus one
                int candidateArrayIndex = arrayHaveList[ci];
                if (!burnList[candidateArrayIndex]) { // available
                    // now we can take this array for this [k,n] pair
                    burnList[candidateArrayIndex] = true;
                    currentCandidateIndex[ni] = ci;
                    gotOne = true;
                    break;
                } // end if
            } // end for

            // react to success or failure of finding an available array for this [k,n] pair
            int niSave = ni; // just for debugging
            if (!gotOne) {
                // we were unable to find a candidate for this [k,n] pair that inhabits a currently-available array; thus, must backtrack previous small n values
                if (ni == 0) { // uh-oh; we can't backtrack at all, thus this k is not a solution; break out of ni loop
                    std::printf("[%d,%d] failed, can't backtrack\n", k, ni+1 );
                    break;
                } // end if
                // now we have ni > 0, so we can backtrack to previous ni's
                int nip = ni-1;
                const ArrayHaveList& prevArrayHaveList = smallNList[nip];
                int cip = currentCandidateIndex[nip]; // get currently-used candidate index for this previous small n
                int aip = prevArrayHaveList[cip]; // get actual currently-used array index for this previous small n
                // unconditionally unburn it
                burnList[aip] = false;
                // reset outselves (current candidate index for current iteration ni) back to -1, so when we "return" to this ni from the backtrack, it'll be ready to check again
                // note: we don't have to reset the burn list element, because it can't have been set to true for the current iteration ni; that only happens when we find an available array
                currentCandidateIndex[ni] = -1;
                // reset the iteration var ni to nip-1, so that it will be incremented back into nip
                ni = nip-1;
            } // end if

            // debug burn list, current candidate index, and decision made by the above loop (not in that order)
            // decision
            if (gotOne)
                std::printf("[%d,%d] got in array %d\n", k, niSave+1, arrayHaveList[currentCandidateIndex[niSave]]+1 );
            else
                std::printf("[%d,%d] failed\n", k, niSave+1 );
            // burn list
            std::printf("burn:[");
            for (int bi = 0; bi < burnList.size(); ++bi) {
                if (bi > 0) std::printf(",");
                std::printf("%s", burnList[bi] ? "X" : "O" );
            } // end for
            std::printf("]\n");
            // current candidate index
            std::printf("candidate:[");
            for (int ni2 = 0; ni2 < currentCandidateIndex.size(); ++ni2) {
                if (ni2 > 0) std::printf(",");
                int ci = currentCandidateIndex[ni2];
                if (ci == -1)
                    std::printf("-");
                else
                    std::printf("%d", ci+1 );
            } // end for
            std::printf("]\n");

        } // end for
        // test if we reached the end of all ni's
        if (ni == smallNList.size()) {
            std::printf("*** SUCCESS ***: k=%d has array order [", k );
            for (int ni = 0; ni < currentCandidateIndex.size(); ++ni) {
                const ArrayHaveList& arrayHaveList = smallNList[ni];
                int ci = currentCandidateIndex[ni];
                int ai = arrayHaveList[ci];
                if (ni > 0) std::printf(",");
                std::printf("%d", ai+1 );
            } // end for
            std::printf("]\n");
            solution.push_back(k);
        } else {
            std::printf("*** FAILED ***: k=%d\n", k );
        } // end if

    } // end for

    return solution;

} // end solve()

Array parseArray(char*& arg, int i, char* argFull ) {
    skipWhite(arg);
    if (*arg != '[') { std::fprintf(stderr, "invalid syntax from \"%s\": no leading '[': argument %d \"%s\".\n", arg, i, argFull ); std::exit(1); }
    ++arg;
    skipWhite(arg);
    Array array;
    while (true) {
        if (*arg == ']') {
            ++arg;
            skipWhite(arg);
            if (*arg != '\0') { std::fprintf(stderr, "invalid syntax from \"%s\": trailing characters: argument %d \"%s\".\n", arg, i, argFull ); std::exit(1); }
            break;
        } else if (*arg == '[') {
            array.push_back(parsePair(arg,i,argFull));
            skipWhite(arg);
            // allow single optional comma after any pair
            if (*arg == ',') {
                ++arg;
                skipWhite(arg);
            } // end if
        } else if (*arg == '\0') {
            std::fprintf(stderr, "invalid syntax from \"%s\": unexpected end-of-array: argument %d \"%s\".\n", arg, i, argFull ); std::exit(1);
        } else {
            std::fprintf(stderr, "invalid syntax from \"%s\": invalid character '%c': argument %d \"%s\".\n", arg, *arg, i, argFull ); std::exit(1);
        } // end if
    } // end for
    return array;
} // end parseArray()

Pair parsePair(char*& arg, int i, char* argFull ) {
    assert(*arg == '[');
    ++arg;
    skipWhite(arg);
    int first = parseNumber(arg,i,argFull);
    skipWhite(arg);
    if (*arg != ',') { std::fprintf(stderr, "invalid syntax from \"%s\": non-',' after number: argument %d \"%s\".\n", arg, i, argFull ); std::exit (1); }
    ++arg;
    skipWhite(arg);
    int second = parseNumber(arg,i,argFull);
    skipWhite(arg);
    if (*arg != ']') { std::fprintf(stderr, "invalid syntax from \"%s\": non-']' after number: argument %d \"%s\".\n", arg, i, argFull ); std::exit (1); }
    ++arg;
    return Pair(first,second);
} // end parsePair()

int parseNumber(char*& arg, int i, char* argFull ) {
    char* endptr;
    unsigned long landing = strtoul(arg, &endptr, 10 );
    if (landing == 0 && errno == EINVAL || landing == ULONG_MAX && errno == ERANGE || landing > INT_MAX) {
        std::fprintf(stderr, "invalid syntax from \"%s\": invalid number: argument %d \"%s\".\n", arg, i, argFull );
        std::exit(1);
    } // end if
    arg = endptr;
    return (int)landing;
} // end parseNumber()

void validateArrayList(const ArrayList& arrayList) {
    // note: all numbers have already been forced to be natural numbers during parsing
    // 1: validate that all seconds are within 1..N
    // 2: validate that we don't have duplicate pairs in the same array
    typedef std::vector<bool> SecondSeenList;
    typedef std::map<int,SecondSeenList> FirstMap;
    for (int ai = 0; ai < arrayList.size(); ++ai) {
        const Array& array = arrayList[ai];
        FirstMap firstMap;
        for (int pi = 0; pi < array.size(); ++pi) {
            const Pair& pair = array[pi];
            if (pair.second == 0) { std::fprintf(stderr, "invalid second number %d: less than 1.\n", pair.second ); std::exit(1); }
            if (pair.second > arrayList.size()) { std::fprintf(stderr, "invalid second number %d: greater than N=%d.\n", pair.second, arrayList.size() ); std::exit(1); }
            std::pair<FirstMap::iterator,bool> insertRet = firstMap.insert(std::pair<int,SecondSeenList>(pair.first,SecondSeenList()));
            SecondSeenList& secondSeenList = insertRet.first->second;
            if (insertRet.second) secondSeenList.assign(arrayList.size(), false );
            if (secondSeenList[pair.second-1]) { std::fprintf(stderr, "invalid array %d: duplicate pair [%d,%d].\n", ai+1, pair.first, pair.second ); std::exit(1); }
            secondSeenList[pair.second-1] = true;
        } // end for
    } // end for
} // end validateArrayList()

void printArrayList(const ArrayList& arrayList) {
    std::printf("----- parsed arrays -----\n");
    for (int ai = 0; ai < arrayList.size(); ++ai) {
        const Array& array = arrayList[ai];
        std::printf("A[%d] == [", ai+1 );
        for (int pi = 0; pi < array.size(); ++pi) {
            const Pair& pair = array[pi];
            if (pi > 0) std::printf(",");
            std::printf("[%d,%d]", pair.first, pair.second );
        } // end for
        std::printf("]\n");
    } // end for
} // end printArrayList()

void printSolution(const Solution& solution) {
    std::printf("----- solution -----\n");
    std::printf("[");
    for (int si = 0; si < solution.size(); ++si) {
        int k = solution[si];
        if (si > 0) std::printf(",");
        std::printf("%d", k );
    } // end for
    std::printf("]\n");
} // end printSolution()

void skipWhite(char*& a) {
    while (*a != '\0' && std::isspace(*a))
        ++a;
} // end skipWhite()

Here's a demo with your example data:

ls;
## a.cpp

g++ a.cpp -o a;

ls;
## a.cpp  a.exe*

./a '[ [3,2], [4,1], [5,1], [7,1], [7,2], [7,3] ]' '[ [3,1], [3,2], [4,1], [4,2], [4,3], [5,3], [7,2] ]' '[ [4,1], [5,1], [5,2], [7,1]]';
## ----- parsed arrays -----
## A[1] == [[3,2],[4,1],[5,1],[7,1],[7,2],[7,3]]
## A[2] == [[3,1],[3,2],[4,1],[4,2],[4,3],[5,3],[7,2]]
## A[3] == [[4,1],[5,1],[5,2],[7,1]]
## ----- KMap -----
## 3: [1:[2],2:[1,2],3:[]]
## 4: [1:[1,2,3],2:[2],3:[2]]
## 5: [1:[1,3],2:[3],3:[2]]
## 7: [1:[1,3],2:[1,2],3:[1]]
## ----- solving algorithm -----
## trivially impossible: k=3.
## -> k=4 n=1
## [4,1] got in array 1
## burn:[X,O,O]
## candidate:[1,-,-]
## -> k=4 n=2
## [4,2] got in array 2
## burn:[X,X,O]
## candidate:[1,1,-]
## -> k=4 n=3
## [4,3] failed
## burn:[X,O,O]
## candidate:[1,1,-]
## -> k=4 n=2
## [4,2] failed
## burn:[O,O,O]
## candidate:[1,-,-]
## -> k=4 n=1
## [4,1] got in array 2
## burn:[O,X,O]
## candidate:[2,-,-]
## -> k=4 n=2
## [4,2] failed
## burn:[O,O,O]
## candidate:[2,-,-]
## -> k=4 n=1
## [4,1] got in array 3
## burn:[O,O,X]
## candidate:[3,-,-]
## -> k=4 n=2
## [4,2] got in array 2
## burn:[O,X,X]
## candidate:[3,1,-]
## -> k=4 n=3
## [4,3] failed
## burn:[O,O,X]
## candidate:[3,1,-]
## -> k=4 n=2
## [4,2] failed
## burn:[O,O,O]
## candidate:[3,-,-]
## -> k=4 n=1
## [4,1] failed, can't backtrack
## *** FAILED ***: k=4
## -> k=5 n=1
## [5,1] got in array 1
## burn:[X,O,O]
## candidate:[1,-,-]
## -> k=5 n=2
## [5,2] got in array 3
## burn:[X,O,X]
## candidate:[1,1,-]
## -> k=5 n=3
## [5,3] got in array 2
## burn:[X,X,X]
## candidate:[1,1,1]
## *** SUCCESS ***: k=5 has array order [1,3,2]
## -> k=7 n=1
## [7,1] got in array 1
## burn:[X,O,O]
## candidate:[1,-,-]
## -> k=7 n=2
## [7,2] got in array 2
## burn:[X,X,O]
## candidate:[1,2,-]
## -> k=7 n=3
## [7,3] failed
## burn:[X,O,O]
## candidate:[1,2,-]
## -> k=7 n=2
## [7,2] failed
## burn:[O,O,O]
## candidate:[1,-,-]
## -> k=7 n=1
## [7,1] got in array 3
## burn:[O,O,X]
## candidate:[2,-,-]
## -> k=7 n=2
## [7,2] got in array 1
## burn:[X,O,X]
## candidate:[2,1,-]
## -> k=7 n=3
## [7,3] failed
## burn:[O,O,X]
## candidate:[2,1,-]
## -> k=7 n=2
## [7,2] got in array 2
## burn:[O,X,X]
## candidate:[2,2,-]
## -> k=7 n=3
## [7,3] got in array 1
## burn:[X,X,X]
## candidate:[2,2,1]
## *** SUCCESS ***: k=7 has array order [3,2,1]
## ----- solution -----
## [5,7]

And just to demonstrate the ease with which you can use this code to test different inputs from the shell, here are a few trivial cases:

./a;
## ----- parsed arrays -----
## ----- KMap -----
## ----- solving algorithm -----
## ----- solution -----
## []

./a '[]';
## ----- parsed arrays -----
## A[1] == []
## ----- KMap -----
## ----- solving algorithm -----
## ----- solution -----
## []

./a '[[1,1]]';
## ----- parsed arrays -----
## A[1] == [[1,1]]
## ----- KMap -----
## 1: [1:[1]]
## ----- solving algorithm -----
## -> k=1 n=1
## [1,1] got in array 1
## burn:[X]
## candidate:[1]
## *** SUCCESS ***: k=1 has array order [1]
## ----- solution -----
## [1]

./a '[[1,1],[2,2]]' '[[2,1],[1,2],[3,2],[3,1]]';
## ----- parsed arrays -----
## A[1] == [[1,1],[2,2]]
## A[2] == [[2,1],[1,2],[3,2],[3,1]]
## ----- KMap -----
## 1: [1:[1],2:[2]]
## 2: [1:[2],2:[1]]
## 3: [1:[2],2:[2]]
## ----- solving algorithm -----
## -> k=1 n=1
## [1,1] got in array 1
## burn:[X,O]
## candidate:[1,-]
## -> k=1 n=2
## [1,2] got in array 2
## burn:[X,X]
## candidate:[1,1]
## *** SUCCESS ***: k=1 has array order [1,2]
## -> k=2 n=1
## [2,1] got in array 2
## burn:[O,X]
## candidate:[1,-]
## -> k=2 n=2
## [2,2] got in array 1
## burn:[X,X]
## candidate:[1,1]
## *** SUCCESS ***: k=2 has array order [2,1]
## -> k=3 n=1
## [3,1] got in array 2
## burn:[O,X]
## candidate:[1,-]
## -> k=3 n=2
## [3,2] failed
## burn:[O,O]
## candidate:[1,-]
## -> k=3 n=1
## [3,1] failed, can't backtrack
## *** FAILED ***: k=3
## ----- solution -----
## [1,2]

Explanation

It's rather complex, but I'll do my best to explain how it works.

First of all, there's a lot of "scaffolding" code I wrote just to parse the array data from command-line arguments. Each command-line argument is expected to comprise a single array, delimited by [], with zero or more pairs in [k,n] format inside the outer brackets. The pairs themselves can be separated by commas, but I made that optional. Whitespace is allowed nearly anywhere, except not inside individual numbers, of course.

I do very strict validation, both during parsing, and after parsing. The after-parsing validation consists of enforcing that all n values (that's the second number in each pair) ranges only over 1..N (where N is of course determined automatically from the number of arrays that were provided), and enforcing that there are no duplicate pairs within any individual array, which was a requirement you specified in your question, and is a requirement depended upon by my algorithm.

So, the high-level process is (1) parsing plus basic syntactical validation, (2) after-parsing validation, (3) solving, and (4) printing the solution. I've also included verbose debugging output at various times, as you can see from the demos.

With respect to the actual solving algorithm, the first thing to recognize is that all ks are independent of one another; the first number of each pair effectively isolates its relevance to just that k. Thus, I designed the data structures and algorithm to work separately with each possible k value that could represent a solution to the problem.

The second thing to recognize is that, for a given k, since you need [k,n] to exist in some array for all values of n in 1..N, and you disallow assigning the same array to different n values, and there are exactly N arrays, what you're really doing is "allocating" or "distributing" the arrays among all values of n.

The third thing to recognize is that for every [k,n] pair, that pair could be included in multiple arrays, and so there could be multiple "candidates" for which array is assigned to a given n. This means we need to check for different allocations of the arrays to the possible n values to find if there's any way to allocate all of the arrays to all values of n.

The algorithm works by first scanning all pairs in all arrays and constructing a "candidate list" of arrays that contain a given [k,n]. These candidate lists (ArrayHaveList) are separated into a map keyed by k (KMap), and further separated into a vector whose index is n-1 (SmallNList). Thus, for every k value represented in the input data, there will be N "candidate vectors", each of length zero or more (well, at least one of the vectors will have to have length one or more, because at least one pair must exist for the k value if it is treated at all).

Then, for each k value in turn, I initialize two important vectors: (1) a "burn" vector (burnList) which is used to prevent the same array from being allocated to multiple n values, and (2) a "candidate selection" vector (currentCandidateIndex) which is used to track which array candidate of the set of candidates for each n value is currently being "tried". Each of these will have a length equal to N. The candidate selection vector is the linchpin of the backtracking algorithm; it keeps track of "where we are" in the backtracking process.

Each n value is allocated to an array until we fail to allocate the next n value to an array, because none of its candidate arrays are available (i.e. they're all "burnt"). At this point, we have to backtrack previous n value assignments to try to "free up" an array for the failed n value, if possible. This requires reversing the n loop, which is written to always advance the current candidate selection from whatever it was prior to the current iteration. This allows it to work both during the first iteration on a given n value and on subsequent backtracking iterations.

The process ends when either we must backtrack from n=1, which is impossible, because there's no previous n value to backtrack, in which case the k value fails the test, or we finish the n loop having successfully allocated all n values to an array.

At all times we must take care to keep the burn list correct (this means assigning true when a particular array is assigned to an n value, and resetting the previous n value's assignment to false when the following n value failed and must backtrack) and the candidate selection vector correct (this means capturing the index of the array candidate that is assigned when it is assigned, and resetting it (for the n in the current iteration) to -1 when we are preparing to backtrack the previous n, which is necessary so that we will "start from scratch" on the current n when we return to it after backtracking the previous n).

I also included a trivial check to see if any n value has zero candidates for a given k, in which case it would be wasteful to run the entire backtracking algorithm for that k, because the code would repeatedly fail on the zero-candidate n and backtrack through the entire multiplicity of all candidates of all previous n values.

JavaScript

Well, that was slightly annoying. I ported the code from C++ to JavaScript, and it appears to be working:

... Bah! Stack Overflow won't let me submit an answer greater than 30,000 characters, so I can't include my JavaScript source as a code snippet. We're going to have to depend on JSFiddle. See here:

http://jsfiddle.net/kobyde5z/

like image 98
bgoldst Avatar answered Nov 14 '22 23:11

bgoldst