Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm help needed

Tags:

algorithm

I have a set of sequences (e.g. 10000 sequences), and generate a matrix (10000x10000) representing the pairwise similarity between every two sequences.

Now the goal is to retrieve a subset (for example 1000 sequences) from the large set and make sure the pairwise similarity between every two sequences in this subset is among a range (e.g. 50%~85%).

Is there any fast algorithm to do that?

like image 754
Mavershang Avatar asked Nov 04 '22 17:11

Mavershang


1 Answers

You can transform this to the graph theory problem:

  1. Each sequence is a node
  2. If similarity of two nodes is in given range than there is an edge between them
  3. Your goal is to find the larges connected component(if your similarity relation is transitive...) or the larges clique(...if not).
like image 115
Peter Popov Avatar answered Nov 09 '22 05:11

Peter Popov