Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find all the combination of substrings that add up to the given string

I'm trying to create a data structure that holds all the possible substring combinations that add up to the original string. For example, if the string is "java" the valid results would be "j", "ava", "ja", "v", "a", an invalid result would be "ja", "a" or "a", "jav"

I had it very easy in finding all the possible substrings

    String string = "java";
    List<String> substrings = new ArrayList<>();
    for( int c = 0 ; c < string.length() ; c++ )
    {
        for( int i = 1 ; i <= string.length() - c ; i++ )
        {
            String sub = string.substring(c, c+i);
            substrings.add(sub);
        }
    }
    System.out.println(substrings);

and now I'm trying to construct a structure that holds only the valid substrings. But its not nearly as easy. I'm in the mist of a very ugly code, fiddling around with the indexes, and no where near of finishing, most likely on a wrong path completely. Any hints?

like image 803
learnAndImprove Avatar asked May 11 '15 13:05

learnAndImprove


4 Answers

Here's one approach:

static List<List<String>> substrings(String input) {

    // Base case: There's only one way to split up a single character
    // string, and that is ["x"] where x is the character.
    if (input.length() == 1)
        return Collections.singletonList(Collections.singletonList(input));

    // To hold the result
    List<List<String>> result = new ArrayList<>();

    // Recurse (since you tagged the question with recursion ;)
    for (List<String> subresult : substrings(input.substring(1))) {

        // Case: Don't split
        List<String> l2 = new ArrayList<>(subresult);
        l2.set(0, input.charAt(0) + l2.get(0));
        result.add(l2);

        // Case: Split
        List<String> l = new ArrayList<>(subresult);
        l.add(0, input.substring(0, 1));
        result.add(l);
    }

    return result;
}

Output:

[java]
[j, ava]
[ja, va]
[j, a, va]
[jav, a]
[j, av, a]
[ja, v, a]
[j, a, v, a]
like image 150
aioobe Avatar answered Oct 19 '22 02:10

aioobe


It seems like this is the problem of finding the compositions of the length of the string, and using those compositions to make substrings. So there are 2^n-1 compositions of a number n, which could make it a bit time-consuming for long strings...

like image 40
gilleain Avatar answered Oct 19 '22 04:10

gilleain


Probably somebody would like another solution which is non-recursive and takes no memory to hold a list:

public static List<List<String>> substrings(final String input) {
    if(input.isEmpty())
        return Collections.emptyList();
    final int size = 1 << (input.length()-1); 
    return new AbstractList<List<String>>() {

        @Override
        public List<String> get(int index) {
            List<String> entry = new ArrayList<>();
            int last = 0;
            while(true) {
                int next = Integer.numberOfTrailingZeros(index >> last)+last+1;
                if(next == last+33)
                    break;
                entry.add(input.substring(last, next));
                last = next;
            }
            entry.add(input.substring(last));
            return entry;
        }

        @Override
        public int size() {
            return size;
        } 
    };
}

public static void main(String[] args) {
    System.out.println(substrings("java"));
}

Output:

[[java], [j, ava], [ja, va], [j, a, va], [jav, a], [j, av, a], [ja, v, a], [j, a, v, a]]

It just calculates the next combination based on its index.

like image 1
Tagir Valeev Avatar answered Oct 19 '22 04:10

Tagir Valeev


Just in case someone will look for the same algorithm in python, here is implementation in Python:

from itertools import combinations

def compositions(s):
    n = len(s)
    for k in range(n):
        for c in combinations(range(1, n), k):
            yield tuple(s[i:j] for i, j in zip((0,) + c, c + (n,)))

Example how it works:

>>> for x in compositions('abcd'):
...     print(x)
('abcd',)
('a', 'bcd')
('ab', 'cd')
('abc', 'd')
('a', 'b', 'cd')
('a', 'bc', 'd')
('ab', 'c', 'd')
('a', 'b', 'c', 'd')

With a small modification you can generate compositions in different order:

def compositions(s):
    n = len(s)
    for k in range(n):
        for c in itertools.combinations(range(n - 1, 0, -1), k):
            yield tuple(s[i:j] for i, j in zip((0,) + c[::-1], c[::-1] + (n,)))

It will give you this:

>>> for x in compositions('abcd'):
...     print(x)
('abcd',)
('abc', 'd')
('ab', 'cd')
('a', 'bcd')
('ab', 'c', 'd')
('a', 'bc', 'd')
('a', 'b', 'cd')
('a', 'b', 'c', 'd')

And with another small addition, you can generate only specified number of splits:

def compositions(s, r=None):
    n = len(s)
    r = range(n) if r is None else [r - 1]
    for k in r:
        for c in itertools.combinations(range(n - 1, 0, -1), k):
            yield tuple(s[i:j] for i, j in zip((0,) + c[::-1], c[::-1] + (n,)))
>>> for x in compositions('abcd', 3):
...     print(x)
('ab', 'c', 'd')
('a', 'bc', 'd')
('a', 'b', 'cd')
like image 1
sirex Avatar answered Oct 19 '22 03:10

sirex