hamming_similarity#
- tangles.util.graph.similarity.hamming_similarity(data: ndarray, sim_thresh: float = 0, return_sparse: bool = True, sequential: bool = True) ndarray | csr_matrix #
The hamming distance, named after Richard Hamming, is a similarity metric measuring the number of indices where to equally sized arrays of data differ.
Given an (n x m) matrix of data, this method calculates an (n x n) matrix where the element (i, j) contains the hamming distance between row i and j.
Parameters#
- data
the data on which to calculate the hamming similarity.
- sim_thresh
ignore similarities below this threshhold. default is 0.
- return_sparse
whether to return a sparse similarity matrix in csr_matrix format or a dense numpy array. Defaults to true, meaning that a sparse matrix is returned by default.
- sequential
whether to calculate the matrix in one step, potentially using a lot of memory, or sequentially linewise, using less memory. Defaults to true, meaning that it is calculated linewise.
Returns#
- np.ndarray or sparse.csr_matrix
Either a dense or a sparse matrix containing the hamming similarities depending on the return_sparse parameter.