Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Concatenate strings if they have an overlapping region

I am trying to write a script that will find strings that share an overlapping region of 5 letters at the beginning or end of each string (shown in example below).

facgakfjeakfjekfzpgghi
                 pgghiaewkfjaekfjkjakjfkj
                                    kjfkjaejfaefkajewf

I am trying to create a new string which concatenates all three, so the output would be:

facgakfjeakfjekfzpgghiaewkfjaekfjkjakjfkjaejfaefkajewf

Edit:

This is the input:

x = ('facgakfjeakfjekfzpgghi', 'kjfkjaejfaefkajewf', 'pgghiaewkfjaekfjkjakjfkj')

**the list is not ordered

What I've written so far *but is not correct:

def findOverlap(seq)
    i = 0
    while i < len(seq): 
        for x[i]:
        #check if x[0:5] == [:5] elsewhere

 x = ('facgakfjeakfjekfzpgghi', 'kjfkjaejfaefkajewf', 'pgghiaewkfjaekfjkjakjfkj')
findOverlap(x)
like image 639
user2767074 Avatar asked Sep 11 '13 18:09

user2767074


People also ask

How to concatenate strings in Python?

One of the simplest and most common methods of concatenating strings in Python is to use the + operator. The way that this works is that using the + operator joins two strings together. In the case of strings, the + operator acts as the concatenation operator.

What is string concatenation in JavaScript?

String Concatenation is the technique of combining two strings. String Concatenation can be done using many ways. It’s very easy to use + operator for string concatenation. This operator can be used to add multiple strings together.

Which class is used for concatenation of strings?

2. StringBuilder First up is the humble StringBuilder. This class provides an array of String-building utilities that makes easy work of String manipulation. Let's build a quick example of String concatenation using the StringBuilder class:

What is concatenation in Excel and how to use it?

In other words, concatenation in Excel is the process of joining two or more values together. This method is often used to combine a few pieces of text that reside in different cells (technically, these are called text strings or simply strings) or insert a formula-calculated value in the middle of some text.


2 Answers

Create a dictionary mapping the first 5 characters of each string to its tail

strings = {s[:5]: s[5:] for s in x}

and a set of all the suffixes:

suffixes = set(s[-5:] for s in x)

Now find the string whose prefix does not match any suffix:

prefix = next(p for p in strings if p not in suffixes)

Now we can follow the chain of strings:

result = [prefix]
while prefix in strings:
    result.append(strings[prefix])
    prefix = strings[prefix][-5:]
print "".join(result)
like image 104
Sven Marnach Avatar answered Nov 02 '22 19:11

Sven Marnach


A brute-force approach - do all combinations and return the first that matches linking terms:

def solution(x):
    from itertools import permutations
    for perm in permutations(x):
        linked = [perm[i][:-5] for i in range(len(perm)-1) 
                               if perm[i][-5:]==perm[i+1][:5]]
        if len(perm)-1==len(linked):
            return "".join(linked)+perm[-1]
    return None

x = ('facgakfjeakfjekfzpgghi', 'kjfkjaejfaefkajewf', 'pgghiaewkfjaekfjkjakjfkj')
print solution(x)
like image 45
Robert Lujo Avatar answered Nov 02 '22 19:11

Robert Lujo