cosine_similarity#

tangles.util.graph.similarity.cosine_similarity(data: ndarray, sim_thresh: float = 1e-10, max_neighbours: int = None, return_sparse: bool = True, sequential: bool = True, chunk_size: int = 1000) ndarray | csr_matrix#

Return the cosine similarity matrix of the rows of the matrix data.

Parameters#

datanp.ndarray

The data.

sim_threshfloat

Similarities smaller than sim_thresh are set to 0.

return_sparsebool

Whether to return a sparse matrix.

sequentialbool

Use less memory (sequential == True is a bit slower for small matrices)

chunk_sizeint

if the similarities are computed sequentially, the similarities are computed in chunks of this size

Returns#

np.ndarray, scipy.sparse.csr_matrix

A matrix of shape (data.shape[0], data.shape[0]) containing the cosine similarities of the rows.