hamming_similarity#

tangles.util.graph.similarity.hamming_similarity(data: ndarray, sim_thresh: float = 0, return_sparse: bool = True, sequential: bool = True) → ndarray | csr_matrix#

The hamming distance, named after Richard Hamming, is a similarity metric measuring the number of indices where to equally sized arrays of data differ.

Given an (n x m) matrix of data, this method calculates an (n x n) matrix where the element (i, j) contains the hamming distance between row i and j.

Parameters#

data: the data on which to calculate the hamming similarity.
sim_thresh: ignore similarities below this threshhold. default is 0.
return_sparse: whether to return a sparse similarity matrix in csr_matrix format or a dense numpy array. Defaults to true, meaning that a sparse matrix is returned by default.
sequential: whether to calculate the matrix in one step, potentially using a lot of memory, or sequentially linewise, using less memory. Defaults to true, meaning that it is calculated linewise.

Returns#

np.ndarray or sparse.csr_matrix: Either a dense or a sparse matrix containing the hamming similarities depending on the return_sparse parameter.