To explain in a clearer way my question I will start by explaining the real-life case I am facing.
I am building a physical panel with many words on it that can be selectively lit, in order to compose sentences. This is my situation:
Example:
SENTENCES:
"A dog is on the table"
"A cat is on the table"
SOLUTIONS:
"A dog cat is on the table"
"A cat dog is on the table"
I tried to approach this problem with "positional rules" finding for each UNIQUE word in the set of ALL the words used in ALL the sentences, what words should be at the left or at the right of it. In the example above, the ruleset for the "on" word would be "left(A, dog, cat, is) + right(the, table).
This approach worked for trivial cases, but my real-life situation has two additional difficulties that got me stuck and that have both to do with the need for repeating words:
MY QUESTION THEREFORE IS:
What is the class of algorithms (or even better: what is the specific algorithm) that studies and solves this kind of problems? Could you post some reference or a code example of it?
EDIT: Level of complexity
From the first round of answers it appears the actual level of complexity (i.e. how different are the sentences one from the other) is an important factor. So, here comes some info on that:
For this project I am using python, but any language reasonably readable (eg: NOT obfuscated perl!) will be fine.
Thank you in advance for your time!
For circular formulas to work, you must enable iterative calculations in your Excel workbook. In Excel 2019, Excel 2016, Excel 2013, and Excel 2010, click File > Options, go to Formulas, and select the Enable iterative calculation check box under the Calculation options section.
Manually detect Circular References in ExcelGo to tab 'Formulas', choose 'Error-checking' and 'Circular References'. Excel will show you exactly in which cell(s) circular references are detected.
If I understand you correctly, this is equivalent to the shortest common supersequence problem. This problem is NP-complete, but there exists approximation algorithms. Google turns up a few papers, including this one.
The problem can be solved with a simple DP algorithm in the case of two input sequences, but this doesn't generalize to multiple sequences since each sequence essentially requires you to add a dimension to the DP table which results in the exponential blow-up.
I'm a bioinformatician, and this sounds like it could be solved by doing a global multiple sequence alignment of all the sentences with infinite mismatch penalties (i.e. disallow mismatches entirely) and modest gap penalties (i.e. allow gaps, but prefer fewer gaps), and then reading off the gapless consensus sequence.
If this formulation is equivalent to your problem, then that means your problem is indeed NP-complete, since multiple sequence alignment is NP-complete, although there are many heuristic algorithms that run in reasonable time. Unfortunately, most MSA algorithms are designed to work on characters of DNA or protein sequences, not words of English.
Here is an example of the kind of alignment that I describe, using the set of three sentences given by the OP. I don't know if the alignment that I give is optimal, but it is one possible solution. Gaps are indicated by a series of dashes.
Sentence 1: ---- -- A red cat -- -- --- ----- -- --- Sentence 2: ---- My - --- cat is on the table -- --- Sentence 3: That -- - --- --- -- -- --- table is red Consensus: That My A red cat is on the table is red
One advantage of this method is that the alignment not only gives you the full sequence of words, but shows which words belong in which sentences.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With