Cosine

class Cosine(k: Int = 3)

Implements Cosine Similarity between strings.

The strings are first transformed in vectors of occurrences of k-shingles (sequences of k characters). In this n-dimensional space, the similarity between the two strings is the cosine of their respective vectors.

The cosine distance is computed as 1 - cosine similarity.

Cosine Similarity

Parameters

k

length of k-shingles

Constructors

Link copied to clipboard
constructor(k: Int = 3)

Functions

Link copied to clipboard

Compute the cosine distance between two string. Corresponds to 1.0 - similarity.

Link copied to clipboard

Compute the cosine similarity between strings.