Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract all possible matching arrays of objects from one array of objects?

I have an array of objects, e.g.

var arr = [
    {"a": "x"},
    {"b": "0"},
    {"c": "k"},
    {"a": "nm"},
    {"b": "765"},
    {"ab": "i"},
    {"bc": "x"},
    {"ab": "4"},
    {"abc": "L"}
];

Let's say I am only interested in objects whose keys correspond to var input = ["ab", "bc"]. It means that I want to extract all possible subarrays with result[i].length == 2 in the following way:

var result = [
    [{"ab": "i"}, {"bc": "x"}],
    [{"ab": "4"}, {"bc": "x"}] // or [{"bc": "x"}, {"ab": "4"}]
];

— that is, the order of objects in subarrays is absolutely not important: I am only interested in the fact that each subarray contains two objects — {"ab": ...} and {"bc": ...}.

If I was interested in var input = ["a","a","ab"], the result should be like this:

var result = [
    [{"a": "x"}, {"a": "nm"}, {"ab": "i"}],
    [{"a": "x"}, {"a": "nm"}, {"ab": "4"}]
];

I cannot find the way to achieve the desired result (assuming that input.length may be much greater than 2 or 3 — even 15–20 may be not enough) without factorial-level amount of computations, which is not physically possible. Is there a way to have some reasonable performance for solving such a problem?
Important note: yes, obviously, for relatively large values of input.length there theoretically may be possible to have very huge numbers of possible combinations, but in practice, result.length will always be reasonably small (maybe 100–200, I even doubt that it could reach 1000...). But for safety, I would want to just set some limit (say, 1000), such that as soon as result.length reaches this limit, the function just returns the current result and stops.

like image 899
lyrically wicked Avatar asked Oct 03 '16 07:10

lyrically wicked


2 Answers

Seeing the problem, it kind of look like a cartessian product. In fact, if before operating, the data model is modified a bit, the expected result is, in almost all cases, a cartessian product. However, there's a case (the second example you provided) that needs special treatment. Here's what I did:

  1. Tweak a bit the model data (this will be done only once) in order to have something suitable to apply cartessian product.
  2. Treat the "special case" of having more than one parameter requesting the same string.

All the important logic is within cartessianProdModified. The important bits in the code are commented. Hope it helps you with your problem or at least gives you some ideas.

Here's a fiddle and here's the code:

var arr = [
    {"a": "x"},
    {"b": "0"},
    {"c": "k"},
    {"a": "nm"},
    {"b": "765"},
    {"ab": "i"},
    {"bc": "x"},
    {"ab": "4"},
    {"abc": "L"},
    {"dummy": "asdf"}
];

// Utility function to be used in the cartessian product
function flatten(arr) {
    return arr.reduce(function (memo, el) {
        return memo.concat(el);
    }, []);
}

// Utility function to be used in the cartessian product
function unique(arr) {
    return Object.keys(arr.reduce(function (memo, el) {
        return (memo[el] = 1) && memo;
    }, {}));
}

// It'll prepare the output in the expected way
function getObjArr(key, val, processedObj) {
    var set = function (key, val, obj) {
        return (obj[key] = val) && obj;
    };
    // The cartessian product is over so we can put the 'special case' in an object form so that we can get the expected output.
    return val !== 'repeated' ? [set(key, val, {})] : processedObj[key].reduce(function (memo, val) {
        return memo.concat(set(key, val, {}));
    }, []);
}

// This is the main function. It'll make the cartessian product.
var cartessianProdModified = (function (arr) {
    // Tweak the data model in order to have a set (key: array of values)
    var processedObj = arr.reduce(function (memo, obj) {
        var firstKey = Object.keys(obj)[0];
        return (memo[firstKey] = (memo[firstKey] || []).concat(obj[firstKey])) && memo;
    }, {});

    // Return a function that will perform the cartessian product of the args.
    return function (args) {
        // Spot repeated args.
        var countArgs = args.reduce(function (memo, el) {
                return (memo[el] = (memo[el] || 0) + 1) && memo;
            }, {}),
            // Remove repeated args so that the cartessian product works properly and more efficiently.
            uniqArgs = unique(args);

        return uniqArgs
                .reduce(function (memo, el) {
                    return flatten(memo.map(function (x) {
                        // Special case: the arg is repeated: we have to treat as a unique value in order to do the cartessian product properly
                        return (countArgs[el] > 1 ? ['repeated'] : processedObj[el]).map(function (y) {
                            return x.concat(getObjArr(el, y, processedObj));
                        });
                    }));
                }, [[]]);
    };
})(arr);

console.log(cartessianProdModified(['a', 'a', 'ab']));
like image 93
acontell Avatar answered Sep 28 '22 08:09

acontell


Sort alphabetically arr and input, which is O(nlogn), and if you are able to make that as you build the arrays, you may be benefited.

Let me explain my idea with an example:

var arr = [
    {"a": "x"},
    {"ab": "i"},
    {"ab": "4"},
    {"abc": "L"}
    {"bc": "x"},
];
var input = ["ab", "bc"];

Search for input[0] in arr (linearly or even with binary search to speed it up). Mark the index.

Search for input[1] in arr, but consider only the subarray of arr, from the index marked in the previous step, to the end of it.

If you find all the elements of input, then push that to the results (you can keep a temporary object for that).

Now, we have to search again for input[0], since it may be that two or more entries share that key. You will have stored that index I mentioned before, so that you will start searching again from this index, and since arr is sorted, you would have to check only the very next element and so on.


Time complextiy:

Sort your arrays (assuming you couldn't have them sorted when you built them): O(nlogn), where n is the number of elements arr has.

Binary search in arr for input[0]: O(logn)

Now the next step of search (for input[1]) is much less than the length of arr, so a very pessimistic bound would be O(n). In practise it won't be O(n) of course, and if you want you can do a binary search for input[1] too, which would cost O(logm), where m is the size of the arr[index_stored: -1].

At this point, we move on to to finding the next occurence of input[0], if any of course, but because we have stored the index we know exactly where to start searching, and we have to check the next element only, that's a constant cost, thus O(1).

And then we do the same for input[1] as above, which is cheap again.

Now, it all depends on the length of input, which is k, and it seems that k < n, and how many occurrences of a key you have, right?

But assuming a normal-avergae situation, the whole procedure has a time complextiy of:

O(nlogn)

However, notice that you have to pay a bit of extra memory to store the indices, which is subject to the number of occurences a key has. With a brute force aglrotihm, which would be slower, you wouldn't need to pay anything extra for memory.

like image 40
gsamaras Avatar answered Sep 28 '22 07:09

gsamaras