I am trying to write a script that will find strings that share an overlapping region of 5 letters at the beginning or end of each string (shown in example below).
facgakfjeakfjekfzpgghi
pgghiaewkfjaekfjkjakjfkj
kjfkjaejfaefkajewf
I am trying to create a new string which concatenates all three, so the output would be:
facgakfjeakfjekfzpgghiaewkfjaekfjkjakjfkjaejfaefkajewf
Edit:
This is the input:
x = ('facgakfjeakfjekfzpgghi', 'kjfkjaejfaefkajewf', 'pgghiaewkfjaekfjkjakjfkj')
**the list is not ordered
What I've written so far *but is not correct:
def findOverlap(seq)
i = 0
while i < len(seq):
for x[i]:
#check if x[0:5] == [:5] elsewhere
x = ('facgakfjeakfjekfzpgghi', 'kjfkjaejfaefkajewf', 'pgghiaewkfjaekfjkjakjfkj')
findOverlap(x)
One of the simplest and most common methods of concatenating strings in Python is to use the + operator. The way that this works is that using the + operator joins two strings together. In the case of strings, the + operator acts as the concatenation operator.
String Concatenation is the technique of combining two strings. String Concatenation can be done using many ways. It’s very easy to use + operator for string concatenation. This operator can be used to add multiple strings together.
2. StringBuilder First up is the humble StringBuilder. This class provides an array of String-building utilities that makes easy work of String manipulation. Let's build a quick example of String concatenation using the StringBuilder class:
In other words, concatenation in Excel is the process of joining two or more values together. This method is often used to combine a few pieces of text that reside in different cells (technically, these are called text strings or simply strings) or insert a formula-calculated value in the middle of some text.
Create a dictionary mapping the first 5 characters of each string to its tail
strings = {s[:5]: s[5:] for s in x}
and a set of all the suffixes:
suffixes = set(s[-5:] for s in x)
Now find the string whose prefix does not match any suffix:
prefix = next(p for p in strings if p not in suffixes)
Now we can follow the chain of strings:
result = [prefix]
while prefix in strings:
result.append(strings[prefix])
prefix = strings[prefix][-5:]
print "".join(result)
A brute-force approach - do all combinations and return the first that matches linking terms:
def solution(x):
from itertools import permutations
for perm in permutations(x):
linked = [perm[i][:-5] for i in range(len(perm)-1)
if perm[i][-5:]==perm[i+1][:5]]
if len(perm)-1==len(linked):
return "".join(linked)+perm[-1]
return None
x = ('facgakfjeakfjekfzpgghi', 'kjfkjaejfaefkajewf', 'pgghiaewkfjaekfjkjakjfkj')
print solution(x)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With