Algorithm to find the union of multiple strings grouped by character index

Question

I have been trying to think of a performance efficient way of finding the union of character occurrences in a set of fixed width strings grouped by index. Something like this;

s1 = "013965"
s2 = "015935"
s3 = "310012"

Resulting in the following where each group of digits exists in all strings at char index n:

out = "[03][1][350][90][631][52]"

I have thought of doing it the very naive way of iterating through every string, at every index while storing the intermediate strings in an array and then iterating through that array to build the output value. However my approach seems to me as a highly inefficient way which is way too far from an asymptotically optimal solution.

Warlord · Accepted Answer

If the set of all possible characters is known in advance, let's say their number is n, with n being not too high (e.g. 10, if you're doing only digits), you can do this by creating m boolean arrays of length n, with m being the number of positions, or digits in input strings and n. The n-th position in m-th array will be true, if the n-th character is present in m-th position in any of input strings. False will denote, that no such character was in m-th position before.

Then you can iterate over each string, and when you encounter character n in position m you mark down true in n-th position of m-th array. At the end, you will have m arrays, each describing the content of m-th group

pos[0] = {true, true, false, false, false, true, true, false, true, false}
pos[1] = {true, false, false, false, false, false, false, false, false, false}
pos[2] = {false, false, true, true, false, false, false, false, false, true}

translates to

[0,1,5,6,8] [0] [2,3,9]

As all the structures are direct access arrays, there is no lookup involved, all access is in constant time and you only need to visit every character once, with no comparison involved. Hope this helps.

Algorithm to find the union of multiple strings grouped by character index

Tags:

java

performance

algorithm

optimization

union

pondigi

1 Answers

Warlord

Recent Activity

Donate For Us

Algorithm to find the union of multiple strings grouped by character index

Tags:

java

performance

algorithm

optimization

union

pondigi

1 Answers

Warlord

Related questions

Recent Activity

Donate For Us