QGram

class QGram(k: Int = 3)

Q-gram distance, as defined by Ukkonen in Approximate string-matching with q-grams and maximal matches. The distance between two strings is defined as the L1 norm of the difference of their profiles (the number of occurrences of each n-gram).

Q-gram distance is a lower bound on Levenshtein distance, but can be computed in O(m+n), where Levenshtein requires O(m.n).

Parameters

k

length of k-shingles

Constructors

Link copied to clipboard
constructor(k: Int = 3)

Functions

Link copied to clipboard
fun distance(first: String, second: String): Int

The distance between two strings is defined as the L1 norm of the difference of their profiles (the number of occurence of each k-shingle).