I have a set of sequences (e.g. 10000 sequences), and generate a matrix (10000x10000) representing the pairwise similarity between every two sequences.
Now the goal is to retrieve a subset (for example 1000 sequences) from the large set and make sure the pairwise similarity between every two sequences in this subset is among a range (e.g. 50%~85%).
Is there any fast algorithm to do that?
You can transform this to the graph theory problem:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With