Palindrome detection efficiency

Tags:

I got curious by Jon Limjap's interview mishap and started to look for efficient ways to do palindrome detection. I checked the palindrome golf answers and it seems to me that in the answers are two algorithms only, reversing the string and checking from tail and head.

def palindrome_short(s):
    length = len(s)
    for i in xrange(0,length/2):
        if s[i] != s[(length-1)-i]: return False
    return True

def palindrome_reverse(s):
    return s == s[::-1]

I think neither of these methods are used in the detection of exact palindromes in huge DNA sequences. I looked around a bit and didn't find any free article about what an ultra efficient way for this might be.

A good way might be parallelizing the first version in a divide-and-conquer approach, assigning a pair of char arrays 1..n and length-1-n..length-1 to each thread or processor.

What would be a better way?

Do you know any?

611

asked Oct 29 '08 19:10

Vinko Vrsalovic

1 Answers

Given only one palindrome, you will have to do it in O(N), yes. You can get more efficiency with multi-processors by splitting the string as you said.

Now say you want to do exact DNA matching. These strings are thousands of characters long, and they are very repetitive. This gives us the opportunity to optimize.

Say you split a 1000-char long string into 5 pairs of 100,100. The code will look like this:

isPal(w[0:100],w[-100:]) and isPal(w[101:200], w[-200:-100]) ...

etc... The first time you do these matches, you will have to process them. However, you can add all results you've done into a hashtable mapping pairs to booleans:

isPal = {("ATTAGC", "CGATTA"): True, ("ATTGCA", "CAGTAA"): False}

etc... this will take way too much memory, though. For pairs of 100,100, the hash map will have 2*4^100 elements. Say that you only store two 32-bit hashes of strings as the key, you will need something like 10^55 megabytes, which is ridiculous.

Maybe if you use smaller strings, the problem can be tractable. Then you'll have a huge hashmap, but at least palindrome for let's say 10x10 pairs will take O(1), so checking if a 1000 string is a palindrome will take 100 lookups instead of 500 compares. It's still O(N), though...

answered Sep 27 '22 23:09

Claudiu

Related questions
                            
                                Maximize number of subgraphs with a given minimum weight
                            
                                opencv: Best way to detect corners on chessboard
                            
                                How to improve performance of matching algorithm
                            
                                Packing arbitrary polygons within an arbitrary boundary
                            
                                How can I measure trends in certain words, like Twitter?
                            
                                minimum sum subarray in O(N) by Kadane's algorithm
                            
                                Bitwise integer cube root algorithm
                            
                                Asymptotically optimal way to find the sum of three elements of an array closest to a given number
                            
                                Tutorial on Autocorrelation? [closed]
                            
                                R-Tree Implementation Java
                            
                                How can I pack ordered text into an arbitrary 2D polygon?
                            
                                Searching two arrays for matches, no extra memory
                            
                                What is the fastest algorithm to calculate all factors of an integer number? [duplicate]
                            
                                Converting base of floating point number without losing precision
                            
                                What is the Big O Complexity of Reversing the Order of Columns in Pandas DataFrame?
                            
                                Find kth largest element from a 2-d sorted array
                            
                                How to remove cycles in an unweighted directed graph, such that the number of edges is maximised?
                            
                                PCA and KNN algorithm
                            
                                Find subset with K elements that are closest to eachother
                            
                                Reductions in parallel in logarithmic time

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Palindrome detection efficiency

Tags:

performance

algorithm

palindrome

Vinko Vrsalovic

People also ask

1 Answers

Claudiu

Recent Activity

Donate For Us