Jaccard

class Jaccard(k: Int = 3)

Each input string is converted into a set of n-grams, the Jaccard index is then computed as |A ∩ B| / |A ∪ B|.

Like Q-Gram distance, the input strings are first converted into sets of n-grams (sequences of n characters, also called k-shingles), but this time the cardinality of each n-gram is not taken into account.

Jaccard index is a metric distance.

Parameters

k

length of k-shingles

Constructors

Link copied to clipboard
constructor(k: Int = 3)

Functions

Link copied to clipboard
fun distance(first: String, second: String): Double

Distance is computed as 1 - similarity.

Link copied to clipboard
fun similarity(first: String, second: String): Double

Compute Jaccard index. |A ∩ B| / |A ∪ B|.