I thought this might be useful to someone doing text parsing, I can't find anything similar in the forum. Basically It compares two strings and returns the Levenshtein distance between them. This is a measure of the differences between the two strings based on certain rules such as: added, missing, wrong & swapped letters within the string. Each difference adds 1 to the total difference `cost`.
It is not my original work, I translated the algorithm from VB/C++ at http://www.merriampark.com/ld.htm#VB
Public domain, free to use in any way. Please point out any bugs you find!
You need IanM's matrix utils because the routine uses his mid$() and min() commands.
tar$="hello" `The target string to compare against
src$="hallo"
print "Wrong letter: ";src$;" ";tar$;" ";
print Lev_Distance(src$,tar$)
src$="helo"
print "Missing letter: ";src$;" ";tar$;" ";
print Lev_Distance(src$,tar$)
src$="helllo"
print "Added letter: ";src$;" ";tar$;" ";
print Lev_Distance(src$,tar$)
src$="ehllo"
print "Transposed letter: ";src$;" ";tar$;" ";
print Lev_Distance(src$,tar$)
src$="ehlllo"
print "Transposed & added letter: ";src$;" ";tar$;" ";
print Lev_Distance(src$,tar$)
src$="lkdfsd"
print "Junk string: ";src$;" ";tar$;" ";
print Lev_Distance(src$,tar$)
wait key : end
Function Lev_Distance(src$ as string,tar$ as string)
`Compares how similar 2 strings are, returns a Levenshtein distance depending on the similarity of the 2 strings
`Differences are: missing character, wrong character, extra character & transposed (swapped) characters
`each difference adds 1 to the `Levenshtein Distance` result.
`Translated from the VB/C++ code at: http://www.merriampark.com/ld.htm#VB
ln_src=Len(src$) : ln_tar=Len(tar$) `Get length of source & target strings
If ln_src=0 or ln_tar=0 Then exitfunction ln_src+ln_tar `No point comparing if 1 or both strings blank
Dim matrix(ln_src,ln_tar) `Matrix with dimensions the size/length of both strings
For src=0 To ln_src : matrix(src,0)=src : Next src `Fill 1st dimension, 1st row of matrix with source string cells
For tar=0 To ln_tar : matrix(0,tar)=tar : Next tar `Fill 2nd dimension, 1st row of matrix with target string cells
For src=1 To ln_src
src_char$=Mid$(src$,src,1) `Get a character from source string to compare with all target string
For tar=1 To ln_tar
tar_char$=mid$(tar$,tar,1)
If src_char$=tar_char$ Then cost=0 else cost=1 `If 2 characters are matching at this position in target string then no extra cost
cell=min(matrix(src-1,tar)+1,matrix(src,tar-1)+1) `Get minimum between cell above +1 & cell to left +1
cell=min(cell,matrix(src-1,tar-1)+cost) `Get minimum of above result and the cell diagonally above/left +cost
if src>1 and tar>1 `If not on 1st character then check for transposition (swapped chars)
trans=matrix(src-2,tar-2)+1
if mid$(src$,src-1,1)<>tar_char$ then inc trans
if src_char$<>mid$(tar$,tar-1,1) then inc trans
if cell>trans then cell=trans `If transposition is a better result then keep it
endif
matrix(src,tar)=cell `Store new cell result in matrix
Next tar
Next src
lev_dist=matrix(ln_src,ln_tar) `Resulting Lev distance is number in bottom right of matrix
undim matrix() `Delete matrix array
EndFunction lev_dist