Package-level declarations
Types
Damerau-Levenshtein distance with transposition (unrestricted Damerau-Levenshtein distance).
The Jaro–Winkler distance metric is designed and best suited for short strings such as person names, and to detect typos; it is (roughly) a variation of Damerau-Levenshtein, where the substitution of 2 close characters is considered less important than the substitution of 2 characters that a far from each other.
The Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one string into the other.
The longest common subsequence (LCS) problem consists in finding the longest subsequence common to two (or more) sequences. It differs from problems of finding common substrings: unlike substrings, subsequences are not required to occupy consecutive positions within the original sequences.
This distance is computed as levenshtein distance divided by the length of the longest string. The resulting value is always in the interval 0 to 1.
Used to indicate the cost of character operations (add, replace, delete). The cost should always be in the range [O, 1]
.
Implementation of the Optimal String Alignment (sometimes called the restricted edit distance) variant of the Damerau-Levenshtein distance.
Q-gram distance, as defined by Ukkonen in Approximate string-matching with q-grams and maximal matches. The distance between two strings is defined as the L1 norm of the difference of their profiles (the number of occurrences of each n-gram).
The Ratcliff/Obershelp algorithm computes the similarity of two strings the doubled number of matching characters divided by the total number of characters in the two strings. Matching characters are those in the longest common subsequence plus, recursively, matching characters in the unmatched region on either side of the longest common subsequence.
Sorensen-Dice coefficient, aka Sørensen index, Dice's coefficient or Czekanowski's binary (non-quantitative) index.
Implementation of Levenshtein that allows to define different weights for different character substitutions.