2.5 Minimum Edit Distance
Edit distance measures similarity between strings.
DEF given a set of string editing operations, minimum edit distance between two strings is defined as the minimum number of operations needed to transform one string into another.
Common operations include insertion(insert a character), deletion(delete a character), substitution(replace a character with another).
DEF Minimum edit distance can be represented by an alignment. This is an alignment between
INTENTION
andEXECUTION
:Every character is mapped either to a character in the other string, or a null character.
Mappings may not cross.
Operations can be assigned costs. Levenshtein distance ( ref: [1] ) assumes every operation has a cost of 1 except the substitution of a letter for itself(E-E in the case above), which has a zero cost.
ALGO Minimum edit distance problem can be solved through dynamic programming. DP equation:
The algorithm gives minimum cost of transferring string
source
into stringtarget
.D[i,j]
represents the optimal cost of transferring ani
-character long prefix ofsource
into aj
-character-long prefix oftarget
.substitute_cost
is zero whensource[i] == target[j]
.
Viterbi algorithm is a probabilistic extension of minimum edit distance.
Reference
[1] Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Cybernetics and Control Theory, 10(8), 707–710. Original in Doklady Akademii Nauk SSSR 163(4): 845–848 (1965).
Last updated
Was this helpful?