Algorithm help! Fast algorithm in searching for a string with its partner

Tags:

I am looking for a fast algorithm for search purpose in a huge string (it's a organism genome sequence composed of hundreds of millions to billions of chars).

There are only 4 chars {A,C,G,T} present in this string, and "A" can only pair with "T" while "C" pairs with "G".

Now I am searching for two substrings (with length constraint of both substring between {minLen, maxLen}, and interval length between {intervalMinLen, intervalMaxLen}) that can pair with one another antiparallely.

For example, The string is: ATCAG GACCA TACGC CTGAT

Constraints: minLen = 4, maxLen = 5, intervalMinLen = 9, intervalMaxLen = 10

The result should be

"ATCAG" pair with "CTGAT"
"TCAG" pair with "CTGA"

Thanks in advance.

Update: I already have the method to determine whether two string can pair with one another. The only concern is doing exhaustive search is very time consuming.

210

asked Jan 10 '12 22:01

Mavershang

2 Answers

I know you aren't searching for substrings, but I think it might be worthwhile to create a reversed genome string containing the matches; the task would then be to find common substrings in the two strings.

Example:

Original string

  ATCAG GACCA TACGC CTGAT

Reversed string:

  TAGTC CGCAT ACCAG GACTA

If you then transform the string to it's pairing values (replace T<->A and C<->G, you get something useful:

  ATCAG GCGTA TGGTC CTGAT

I know that this preprocessing will be costly and consume a lot of space, but you will be able to use standard string algorithms afterwards and with the amount of comparisons you are searching, it certainly will be justfied.

When the original string and the reversed lookup string I think your problem sounds surprisingly alike to the 'longest common substring' problem which is well described. Your second preprocessing would be to construct a suffix tree to allow fast lookup of substrings.

you will end up with quadratic running times, but I doubt you can do better

answered Oct 12 '22 13:10

faester

Easiest solution would be to construct a Trie from the data of maximum height maxLen. Each node should also have a list of locations where that particular string was found (it will be in ascending order, if the trie is created in order).

Then, while searching, just reverse compliment the search string and go through the trie. When you find a match check if one of the matches is located in proper interval, if 'yes' then output the strings.

Let me know if you need the detailed algorithm.

answered Oct 12 '22 13:10

ElKamina

Related questions
                            
                                How can I find the type of T in a c# Generic collection of T when all I know is the type of the collection?
                            
                                Retrieve Database Schema
                            
                                Calculating the progress percentage
                            
                                C#: How to perform 'as' operation with a Type
                            
                                Office documents to PDF
                            
                                Rich domain model with ORM
                            
                                Finding the highest set flag in an enum value
                            
                                how to convert Dictionary<dynamic, dynamic> to Dictionary<string, string> using Colllection.ToDictionary()
                            
                                How I can find Data Annotation attributes and their parameters using reflection
                            
                                Ninject Multicasting
                            
                                Object cannot be cast from DBNull to other types. Error when a null value is read by the Reader
                            
                                ASP.NET MVC3: How do I hide fields using Html.DisplayForModel and Html.EditorForModel
                            
                                Event for DateChange at midnight [duplicate]
                            
                                Get the path of every explorer window with c#
                            
                                Extract one property as a List<String> from a ICollection of a Model
                            
                                Attach event to dynamic object
                            
                                How to use a Provider in Ninject
                            
                                IIS Url Rewrite Module causes "unable to start debugging on the webserver"
                            
                                Two ways to send email via SmtpClient asynchronously, different results
                            
                                Add Service Reference, Multiple Credential Prompt

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Algorithm help! Fast algorithm in searching for a string with its partner

Tags:

c#

algorithm

bioinformatics

Mavershang

People also ask

2 Answers

faester

ElKamina

Recent Activity

Donate For Us