Approximate k nearest neighbor search with flexible distance function.

find_knn(data, k, ..., query = NULL, distance = c("euclidean",
  "cosine", "rankcor", "l2"), method = c("covertree", "hnsw"),
  sym = TRUE, verbose = FALSE)

Arguments

data

Data matrix

k

Number of nearest neighbors

...

Parameters passed to hnsw_knn

query

Query matrix. Leave it out to use data as query

distance

Distance metric to use. Allowed measures: Euclidean distance (default), cosine distance (\(1-corr(c_1, c_2)\)) or rank correlation distance (\(1-corr(rank(c_1), rank(c_2))\))

method

Method to use. 'hnsw' is tunable with ... but generally less exact than 'covertree' (default: 'covertree')

sym

Return a symmetric matrix (as long as query is NULL)?

verbose

Show a progressbar? (default: FALSE)

Value

A list with the entries:

index

A \(nrow(data) \times k\) integer matrix containing the indices of the k nearest neighbors for each cell.

dist

A \(nrow(data) \times k\) double matrix containing the distances to the k nearest neighbors for each cell.

dist_mat

A dgCMatrix if sym == TRUE, else a dsCMatrix (\(nrow(query) \times nrow(data)\)). Any zero in the matrix (except for the diagonal) indicates that the cells in the corresponding pair are close neighbors.