ML Similarity Measurement

Distance

Euclidean Distance

d12=(x1x2)2+(y1y2)2 d_{12} = \sqrt{(x_1-x_2)^2 + (y_1-y_2)^2}

d12=(x1x2)2+(y1y2)2+(z1z2)2 d_{12} = \sqrt{(x_1-x_2)^2 + (y_1-y_2)^2 + (z_1-z_2)^2}

d12=k=1n(x1kx2k)2 d_{12} = \sqrt{\sum_{k=1}^n(x_{1k} - x_{2k})^2}

Standardized Euclidean distance

X=Xms X^* = \frac{X - m}{s}

also Weighted Euclidean distance

Manhattan Distance

also called City Block distance d12=x1x2+y1y2 d_{12} = |x_1 - x_2| + |y_1 - y_2|

d12=k=1nx1kx2k d_{12} = \sum_{k=1}^n|x_{1k}-x_{2k}|

Chebyshev Distance

d12=max(x1x2,y1y2) d_{12} = \max(|x_1 - x_2| , |y_1 - y_2|)

d12=maxi(x1ix2i) d_{12} = \max_i(|x_{1i} - x_{2i}|)

d12=limk(i=1nx1ix2ik)1k d_{12} = \lim_{k \rightarrow \infty}\bigg( \sum_{i=1}^n|x_{1i} - x_{2i}|^k \bigg)^{\frac{1}{k}}

Minkowski Distance

Minkowski Distance is not a distance, but a set definition of distance.

d12=k=1nx1kx2kpp d_{12} = \sqrt[p]{\sum_{k=1}^n{|x_{1k} - x_{2k}}|^p}

let’s say p is var

Mahalanobis Distance

Hamming distance

Cosine

Jaccard similarity coefficient

Correlation coefficient and Correlation distance

Information Entropy

Ref