NGram

class NGram(n: Int = 2)

N-Gram Similarity as defined by Kondrak, "N-Gram Similarity and Distance", String Processing and Information Retrieval, Lecture Notes in Computer Science Volume 3772, 2005, pp 115-126.

The algorithm uses affixing with special character '\n' to increase the weight of first characters. The normalization is achieved by dividing the total similarity score the original length of the longest word.

N-Gram Similarity and Distance

Parameters

n

n-gram length.

Constructors

Link copied to clipboard
constructor(n: Int = 2)

Functions

Link copied to clipboard
fun distance(first: String, second: String): Double

Compute n-gram distance, in the range [0, 1].