SorensenDice

class SorensenDice(val k: Int = 3)

Sorensen-Dice coefficient, aka Sørensen index, Dice's coefficient or Czekanowski's binary (non-quantitative) index.

The strings are first converted to boolean sets of k-shingles (sequences of k characters), then the similarity is computed as 2 * |A inter B| / (|A| + |B|).

Attention: Sorensen-Dice distance (and similarity) does not satisfy triangle inequality.

Sørensen–Dice coefficient

Parameters

k

length of k-shingles

Constructors

Link copied to clipboard
constructor(k: Int = 3)

Functions

Link copied to clipboard
fun distance(first: String, second: String): Double

Returns 1 - similarity.

Link copied to clipboard
fun similarity(first: String, second: String): Double

Similarity is computed as 2 * |A ∩ B| / (|A| + |B|).

Properties

Link copied to clipboard
val k: Int = 3