I'm constructing lists of words for a psychology experiment. We're trying to create a chain of words such that words adjacent in the list are related, but all other words in the list are not related. So for example:
SCHOOL, CAFETERIA, PIZZA, CRUST, EARTH, OCEAN, WHALE, ...
So here we see the first word is related to the second, and the second is related to the third, but the third isn't related to the first. (And the first isn't related to the fourth, fifth, sixth, ... either)
I have a list of 1600 words such that each number from 0 to 1600 corresponds to a word. I also have a very large matrix (1600 x 1600) that tells me (on a scale of 0 to 1) how related each word is to every other word. (These are from a latent semantic analysis; http://lsa.colorado.edu/)
I can make the lists, but it's not very efficient at all, and my adjacent words aren't super strongly related to each other.
Does anyone have a better approach? The problem with my approach is that I'll sometimes settle for an okay match that meets my criteria when a better match is potentially still out there. Also, it would be nice if I didn't have to dump the whole list but could backtrack a few steps to where the list potentially went wrong.
This might be a good candidate for a genetic algorithm. You can create a large number of completely random possibilities, score each one with an objective function, and then iterate the population by crossing over mates based on fitness (possibly throwing some mutations in as well).
If done properly, this should give you a large-ish population of good solutions. If the population is large enough, the fitness function defined well enough and mutation is sufficient to get you out of any valleys you might otherwise get stuck in, you might even converge overwhelmingly on the optimal answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With