Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm to create unique random concatenation of items

I'm thinking about an algorithm that will create X most unique concatenations of Y parts, where each part can be one of several items. For example 3 parts:

part #1: 0,1,2
part #2: a,b,c
part #3: x,y,z

And the (random, one case of some possibilities) result of 5 concatenations:

0ax
1by
2cz
0bz (note that '0by' would be "less unique " than '0bz' because 'by' already was)
2ay (note that 'a' didn't after '2' jet, and 'y' didn't after 'a' jet)

Simple BAD results for next concatenation:

1cy ('c' wasn't after 1, 'y' wasn't after 'c', BUT '1'-'y' already was as first-last 

Simple GOOD next result would be:

0cy ('c' wasn't after '0', 'y' wasn't after 'c', and '0'-'y' wasn't as first-last part)
1az
1cx

I know that this solution limit possible results, but when all full unique possibilities will gone, algorithm should continue and try to keep most avaible uniqueness (repeating as few as possible).

Consider real example:

Boy/Girl/Martin
bought/stole/get
bottle/milk/water

And I want results like:

Boy get milk
Martin stole bottle
Girl bought water
Boy bought bottle (not water, because of 'bought+water' and not milk, because of 'Boy+milk')

Maybe start with a tree of all combinations, but how to select most unique trees first?

Edit: According to this sample data, we can see, that creation of fully unique results for 4 words * 3 possibilities, provide us only 3 results:

Martin stole a bootle
Boy bought an milk
He get hard water

But, there can be more results requested. So, 4. result should be most-available-uniqueness like Martin bought hard milk, not Martin stole a water

Edit: Some start for a solution ? Imagine each part as a barrel, wich can be rotated, and last item goes as first when rotates down, first goes as last when rotating up. Now, set barells like this:

Martin|stole |a   |bootle
Boy   |bought|an  |milk
He    |get   |hard|water

Now, write sentences as We see, and rotate first barell UP once, second twice, third three and so on. We get sentences (note that third barell did one full rotation):

Boy   |get   |a   |milk
He    |stole |an  |water
Martin|bought|hard|bootle 

And we get next solutions. We can do process one more time to get more solutions:

He    |bought|a   |water
Martin|get   |an  |bootle
Boy   |stole |hard|milk 

The problem is that first barrel will be connected with last, because rotating parallel. I'm wondering if that will be more uniqe if i rotate last barrel one more time in last solution (but the i provide other connections like an-water - but this will be repeated only 2 times, not 3 times like now). Don't know that "barrels" are good way ofthinking here.

I think that we should first found a definition for uniqueness

For example, what is changing uniqueness to drop ? If we use word that was already used ? Do repeating 2 words close to each other is less uniqe that repeating a word in some gap of other words ? So, this problem can be subjective.

But I think that in lot of sequences, each word should be used similar times (like selecting word randomly and removing from a set, and after getting all words refresh all options that they can be obtained next time) - this is easy to do.

But, even if we get each words similar number od times, we should do something to do-not-repeat-connections between words. I think, that more uniqe is repeating words far from each other, not next to each other.

like image 932
Piotr Müller Avatar asked Nov 13 '22 20:11

Piotr Müller


1 Answers

Anytime you need a new concatenation, just generate a completely random one, calculate it's fitness, and then either accept that concatenation or reject it (probabilistically, that is).

const C = 1.0

function CreateGoodConcatenation()
{
  for (rejectionCount = 0; ; rejectionCount++)
  {
    candidate = CreateRandomConcatination()
    fitness = CalculateFitness(candidate) // returns 0 < fitness <= 1
    r = GetRand(zero to one)
    adjusted_r = Math.pow(r, C * rejectionCount + 1)  // bias toward acceptability as rejectionCount increases
    if (adjusted_r < fitness)
    {
      return candidate
    }
  }
}

CalculateFitness should never return zero. If it does, you might find yourself in an infinite loop.

As you increase C, less ideal concatenations are accepted more readily. As you decrease C, you face increased iterations for each call to CreateGoodConcatenation (plus less entropy in the result)

like image 91
Fantius Avatar answered Dec 14 '22 23:12

Fantius