Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

string of integers puzzle

Tags:

algorithm

I apologize for not have the math background to put this question in a more formal way. I'm looking to create a string of 796 letters (or integers) with certain properties.

Basically, the string is a variation on a De Bruijn sequence B(12,4), except order and repetition within each n-length subsequence are disregarded. i.e. ABBB BABA BBBA are each equivalent to {AB}. In other words, the main property of the string involves looking at consecutive groups of 4 letters within the larger string (i.e. the 1st through 4th letters, the 2nd through 5th letters, the 3rd through 6th letters, etc) And then producing the set of letters that comprise each group (repetitions and order disregarded)

For example, in the string of 9 letters:

A B B A C E B C D

the first 4-letter groups is: ABBA, which is comprised of the set {AB} the second group is: BBAC, which is comprised of the set {ABC} the third group is: BACE, which is comprised of the set {ABCE} etc.

The goal is for every combination of 1-4 letters from a set of N letters to be represented by the 1-4-letter resultant sets of the 4-element groups once and only once in the original string.

For example, if there is a set of 5 letters {A, B, C, D, E} being used Then the possible 1-4 letter combinations are: A, B, C, D, E, AB, AC, AD, AE, BC, BD, BE, CD, CE, DE, ABC, ABD, ABE, ACD, ACE, ADE, BCD, BCE, BDE, CDE, ABCD, ABCE, ABDE, ACDE, BCDE

Here is a working example that uses a set of 5 letters {A, B, C, D, E}.

D D D D E C B B B B A E C C C C D A E E E E B D A A A A C B D D B

The 1st through 4th elements form the set: D The 2nd through 5th elements form the set: DE The 3rd through 6th elements form the set: CDE The 4th through 7th elements form the set: BCDE The 5th through 8th elements form the set: BCE The 6th through 9th elements form the set: BC The 7th through 10th elements form the set: B etc.

* I am hoping to find a working example of a string that uses 12 different letters (a total of 793 4-letter groups within a 796-letter string) starting (and if possible ending) with 4 of the same letter. *

Here is a working solution for 7 letters:

AAAABCDBEAAACDECFAAADBFBACEAGAADEFBAGACDFBGCCCCDGEAFAGCBEEECGFFBFEGGGGFDEEEEFCBBBBGDCFFFFDAGBEGDDDDBE

like image 790
Erik Avatar asked Jul 26 '10 18:07

Erik


2 Answers

Beware that in order to attempt exhaustive search (answer in VB is trying a naive version of that) you'll first have to solve the problem of generating all possible expansions while maintaining lexicographical order. Just ABC, expands to all perms of AABC, plus all perms of ABBC, plus all perms of ABCC which is 3*4! instead of just AABC. If you just concatenate AABC and AABD it would cover just 4 out of 4! perms of AABC and even that by accident. Just this expansion will bring you exponential complexity - end of game. Plus you'll need to maintain association between all explansions and the set (the set becomes a label).

Your best bet is to use one of known efficient De Bruijn constuctors and try to see if you can put your set-equivalence in there. Check out

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.14.674&rep=rep1&type=pdf

and

http://www.dim.uchile.cl/~emoreno/publicaciones/FINALES/copyrighted/IPL05-De_Bruijn_sequences_and_De_Bruijn_graphs_for_a_general_language.pdf

for a start.

If you know graphs, another viable option is to start with De Bruijn graph and formulate your set-equivalence as a graph rewriting. 2nd paper does De Bruijn graph partitioning.

BTW, try VB answer just for A,B,AB (at least expansion is small) - it will make AABBAB and construct ABBA or ABBAB (or throw in a decent language) both of which are wrong. You can even prove that it will always miss with 1st lexical expansions (that's what AAB, AAAB etc. are) just by examining first 2 passes (it will always miss 2nd A for NxA because (N-1)xA+B is in the string (1st expansion of {AB}).

Oh and if we could establish how many of each letters an optimal soluton should have (don't look at B(5,2) it's too easy and regular :-) a random serch would be feasible - you generate candidates with provable traits (like AAAA, BBBB ... are present and not touching and is has n1 A-s, n2 B-s ...) and random arrangement and then test whether they are solutions (checking is much faster than exhaustive search in this case).

like image 85
ZXX Avatar answered Nov 09 '22 08:11

ZXX


Cool problem. Just a draft/psuedo algo:

dim STR-A as string = getall(ABCDEFGHIJKL) 
//custom function to generate concat list of all 793 4-char combos.
//should be listed side-by-side to form 3172 character-long string.
//different ordering may ultimately produce different results.
//brute-forcing all orders of combos is too much work (793! is a big #).
//need to determine how to find optimal ordering, for this particular
//approach below.
dim STR-B as string = "" // to hold the string you're searching for  
dim STR-C as string = "" // to hold the sub-string you are searching in  
dim STR-A-NEW as string = "" //variable to hold your new string
dim MATCH as boolean = false //variable to hold matching status

while len(STR-A) > 0 
//check each character in STR-A, which will be shorted by 1 char on each
//pass.
  MATCH = false
  STR-B = left(STR-A, 4)
  STR-B = reduce(STR-B) 
  //reduce(str) is a custom re-usable function to sort & remove duplicates

  for i as integer = 1 to len((STR-A) - 1)
    STR-C = substr(STR-A, i, 4) 
    //gives you the 4-character sequence beginning at position i
    STR-C = reduce(STR-C)
    IF STR-B = STR-C Then
       MATCH = true
       exit for 
       //as long as there is even one match, you can throw-away the first 
       //letter
    END IF
    i = i+1
  next


  IF match = false then 
  //if you didn't find a match, then the first letter should be saved
     STR-A-NEW += LEFT(STR-B, 1)
  END IF

  MATCH = false //re-init MATCH
  STR-A = RIGHT(STR-A, LEN(STR-A) - 1) //re-init STR_A
wend

Anyway -- there could be problems at this, and you'd need to write another function to parse your result string (STR-A-NEW) to prove that it's a viable answer...

like image 40
dave Avatar answered Nov 09 '22 09:11

dave