This is a question on combinatorics from a non-mathematician, so please try to bear with me!
Given an array of n distinct characters, I want to generate subsets of k characters in a minimal-change order, i.e. an order in which generation i+1 contains exactly one character that was not in generation i. That's not too hard in itself. However, I also want to maximise the number of cases in which the character that is swapped out in generation i+1 is the same character that was swapped in in generation i. To illustrate, for n=7, k=3:
abc abd abe* abf* abg* afg aeg* adg* acg* acd ace* acf* aef adf* ade bde bdf bef bcf* bce bcd* bcg* bdg beg* bfg* cfg ceg* cdg* cde cdf* cef def deg dfg efg
The asterisked strings indicate the case I want to maximise; e.g. the e that is new in generation 3, abe, replaces a d that was new in generation 2, abd. It doesn't seem possible to have this happen in every generation, but I want it to happen as often as possible.
Typical array sizes that I use are 20-30 and subset sizes around 5-8.
I'm using an odd language, Icon (or actually its derivative Unicon), so I don't expect anyone to post code that I can used directly. But I will be grateful for answers or hints in pseudo-code, and will do my best to translate C etc. Also, I have noticed that problems of this kind are often discussed in terms of arrays of integers, and I can certainly apply solutions posted in such terms to my own problem.
Thanks
Kim Bastin
Edit 15 June 2010:
I do seem to have got into deeper water than I thought, and while I'm grateful for all answers, not all of them have been relevant. As an example of a solution which is NOT adequate, let me post my own Unicon procedure for generating k-ary subsets of a character set s in a minimal change order. Things you need to know to understand the code are: a preposed * means the size of a structure, so if s is a string, *s means the size of s (the number of characters it contains). || is a string concatenation operation. A preposed ! produces each element of a structure, e.g. each character of a string, in turn on successive passes. And the 'suspend' control structure returns a result from a procedure, but leaves the procedure 'in suspense', with all local variables in place, so that new results can be produced if the procedure is called in a loop.
procedure revdoor(s, k)
# Produces all k-subsets of a string or character set s in a 'revolving
# door' order. Each column except the first traverses the characters
# available to it in alphabetical and reverse alphabetical order
# alternately. The order of the input string is preserved.
# If called in a loop as revdoor("abcdefg", 3),
# the order of production is: abc, abd, abe, abf, abg, acg, acf, ace, acd,
# ade, adf, adg, aeg, aef, afg, bfg, bef, beg, bdg, bdf, bde, bcd, bce,
# bcf, bcg, cdg, cdf, cde, cef, ceg, cfg, dfg, deg, def, efg
local i
static Ctl
if /Ctl then { # this means 'if Ctl doesn't exist'
if k = 0 then return ""
Ctl := list(k, 1) # a list of k elements, each initialised to 1.
}
if Ctl[k] = 1 then {
if k = 1 then suspend !s else
every i := 1 to *s-k+1 do {
suspend s[i] || revdoor(s[i+1:0], k-1)
}
} else {
if k = 1 then suspend !reverse(s) else
every i := -k to -*s by -1 do {
suspend s[i] || revdoor(s[i+1:0], k-1)
}
}
# the following line multiplies element k of Ctl by -1 if k < size of Ctl
# (this controls the order of generation of characters),
# and destroys Ctl on final exit from the procedure.
if k < *Ctl then Ctl[k] *:= -1 else Ctl := &null
end
Note that the output of the above procedure is not optimal in my sense. One result of my investigations so far is that the maximum 'swapping score' for a list of k-ary subsets of n elements is not less than comb(n-1, k), or in the case of n=7, k=3, the maximum score is at least comb(6, 3) = 20. I define the 'swapping score' of a list as the number of items in the list whose new element replaces an element in the previous item which was itself new. I haven't got the mathematical equipment to prove this, but it is easy to see with k=1 or k=2. For certain (n,k) a slightly higher score is possible, as in the case of n=7, k=3:
abc abd abe abf abg
acg adg aeg afg
efg dfg cfg bfg
beg bdg bcg
bcd bce bcf
bdf bef
def cef aef
adf acf
acd ace
ade
bde cde
cdf cdg
ceg
deg (swapping score = 21)
It may be noted that the above list is in 'strong minimal change order' (like word golf: the new character is always in the same position as the character it replaces), which may indicate the direction my own work is taking. I hope to post something more in a few days.
Kim
It's fairly straightforward. In order to maximise replacement just think of the characters as numbers and increment the string by one till you have reached the upper limit. Then check to see that you don't use the same character twice in the string. I think this would work:
char c[] = {'a', 'b', 'c', 'd', 'e'};
const int n = 5;
const int k = 3;
char s[k];
void print()
{
for( int i = 0; i < k; ++i )
putchar(c[s[i]]);
putchar('\n');
}
bool increment( int m )
{
// reached the limit?
if( ++s[m] == n && m == 0 )
return false;
next:
for( int i = 0; i < m; ++i )
{
if( s[m] == n )
{
// carry
s[m] = 0;
if( !increment( m-1 ))
return false;
goto next;
}
else if( s[i] == s[m] )
{
// the character is already used
++s[m];
goto next;
}
}
return true;
}
int main(int, char**)
{
// initialise
for( int i = 0; i < k; ++i )
s[i] = i;
// enumerate all combinations
do
print();
while(increment(k-1));
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With