2.5 Minimum Edit Distance
Edit distance measures similarity between strings.
DEF given a set of string editing operations, minimum edit distance between two strings is defined as the minimum number of operations needed to transform one string into another.
Common operations include insertion(insert a character), deletion(delete a character), substitution(replace a character with another).
DEF Minimum edit distance can be represented by an alignment. This is an alignment between
INTENTIONandEXECUTION:I N T E * N T I O N | | | | | | | | | | * E X E C U T I O NEvery character is mapped either to a character in the other string, or a null character.
Mappings may not cross.
Operations can be assigned costs. Levenshtein distance ( ref: [1] ) assumes every operation has a cost of 1 except the substitution of a letter for itself(E-E in the case above), which has a zero cost.
ALGO Minimum edit distance problem can be solved through dynamic programming. DP equation:
D[i,j] = min( D[i-1,j]+delete_cost(source[i]), D[i,j-1]+insert_cost(target[j]), D[i-1,j-1]+substitute_cost(source[i],target[j]) )The algorithm gives minimum cost of transferring string
sourceinto stringtarget.D[i,j]represents the optimal cost of transferring ani-character long prefix ofsourceinto aj-character-long prefix oftarget.substitute_costis zero whensource[i] == target[j].
Viterbi algorithm is a probabilistic extension of minimum edit distance.
Reference
[1] Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Cybernetics and Control Theory, 10(8), 707–710. Original in Doklady Akademii Nauk SSSR 163(4): 845–848 (1965).
Last updated
Was this helpful?