Algorithm to measure similarity between two sequences of strings

Tags:

sequences

How can I measure similarity-percentage between two sequences of strings?

I have two text files and In files there sequences are written like

First file:

AAA BBB DDD CCC GGG MMM AAA MMM

Second file:

BBB DDD CCC MMM AAA MMM

How to measure similarity between these two files in terms of order of strings?

For example in above example both files have similarity due to order of strings is same however some strings are missing in file-2. What algorithm is best suitable to solve this problem so that I can measure how similar is order of strings not frequency of strings in two?

414

asked Jun 01 '12 05:06

Dheeraj Agarwal

1 Answers

You could use the Levenstein Distance algorithm. It analyzes how many edits that is needed to transform one string into another. This article explains it pretty well, and a sample implementation is provided.

Copy paste from Codeproject:

1.  Set n to be the length of s. ("GUMBO")
    Set m to be the length of t. ("GAMBOL")
    If n = 0, return m and exit.
    If m = 0, return n and exit.
    Construct two vectors, v0[m+1] and v1[m+1], containing 0..m elements.
2.  Initialize v0 to 0..m.
3.  Examine each character of s (i from 1 to n).
4.  Examine each character of t (j from 1 to m).
5.  If s[i] equals t[j], the cost is 0.
    If s[i] is not equal to t[j], the cost is 1.
6.  Set cell v1[j] equal to the minimum of:
    a. The cell immediately above plus 1: v1[j-1] + 1.
    b. The cell immediately to the left plus 1: v0[j] + 1.
    c. The cell diagonally above and to the left plus the cost: v0[j-1] + cost.
7.  After the iteration steps (3, 4, 5, 6) are complete, the distance is found in the cell v1[m].

154

answered Nov 30 '22 17:11

alexn

Related questions
                            
                                Assign sequential count for numerical runs
                            
                                How to change the Oracle Sequence using loop?
                            
                                Additive Sequence Algorithm
                            
                                Longest common contiguous subsequence - algorithm
                            
                                How should I implement an atomic sequence in Perl?
                            
                                Analysis of “Finding Maximum Sum of Subsequent Elements” algorithm
                            
                                What is the name of such diagrams?
                            
                                Postgres 'if not exists' fails because the sequence exists
                            
                                Javascript - How Do I Check if 3 Numbers Are Consecutive and Return Starting Points?
                            
                                Clojure sub-sequence position in sequence
                            
                                Create long data format based on strings of sequences defined by colons and concatenated vectors
                            
                                Repeating vectors in Clojure
                            
                                C++ API for returning sequences in a generic way
                            
                                How to generate for loop number sequence by using variable names in bash? [duplicate]
                            
                                How does Julia interpret 10:1?
                            
                                What's the alternate character combination for the double quote character in C/C++?
                            
                                number rows by variable, but start over when condition is hit
                            
                                Sequence increment by 50 instead of 1
                            
                                Convert List of Numbers to String Ranges
                            
                                How do you get the next value in a sequence into a variable?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With