I've seen another StackOverflow thread talking about the various implementations for calculating the Euclidian norm and I'm having trouble seeing why/how a particular implementation works.
The code is found in an implementation of the MMD metric: https://github.com/josipd/torch-two-sample/blob/master/torch_two_sample/statistics_diff.py
Here is some beginning boilerplate:
import torch
sample_1, sample_2 = torch.ones((10,2)), torch.zeros((10,2))
Then the next part is where we pick up from the code above.. I'm unsure why the samples are being concatenated together..
sample_12 = torch.cat((sample_1, sample_2), 0)
distances = pdist(sample_12, sample_12, norm=2)
and are then passed to the pdist function:
def pdist(sample_1, sample_2, norm=2, eps=1e-5):
r"""Compute the matrix of all squared pairwise distances.
Arguments
---------
sample_1 : torch.Tensor or Variable
The first sample, should be of shape ``(n_1, d)``.
sample_2 : torch.Tensor or Variable
The second sample, should be of shape ``(n_2, d)``.
norm : float
The l_p norm to be used.
Returns
-------
torch.Tensor or Variable
Matrix of shape (n_1, n_2). The [i, j]-th entry is equal to
``|| sample_1[i, :] - sample_2[j, :] ||_p``."""
here we get to the meat of the calculation
n_1, n_2 = sample_1.size(0), sample_2.size(0)
norm = float(norm)
if norm == 2.:
norms_1 = torch.sum(sample_1**2, dim=1, keepdim=True)
norms_2 = torch.sum(sample_2**2, dim=1, keepdim=True)
norms = (norms_1.expand(n_1, n_2) +
norms_2.transpose(0, 1).expand(n_1, n_2))
distances_squared = norms - 2 * sample_1.mm(sample_2.t())
return torch.sqrt(eps + torch.abs(distances_squared))
I am at a loss for why the euclidian norm would be calculated this way. Any insight would be greatly appreciated
The L2 norm is calculated as the square root of the sum of the squared vector values."
The L2 norm calculates the distance of the vector coordinate from the origin of the vector space. As such, it is also known as the Euclidean norm as it is calculated as the Euclidean distance from the origin. The result is a positive distance value.
What is * ? For . view() pytorch expects the new shape to be provided by individual int arguments (represented in the doc as *shape ). The asterisk ( * ) can be used in python to unpack a list into its individual elements, thus passing to view the correct form of input arguments it expects.
cdist. torch. cdist (x1, x2, p=2.0, compute_mode='use_mm_for_euclid_dist_if_necessary')[source] Computes batched the p-norm distance between each pair of the two collections of row vectors.
Let's walk through this block of code step by step. The definition of Euclidean distance, i.e., L2 norm is
Let's consider the simplest case. We have two samples,
Sample a
has two vectors [a00, a01]
and [a10, a11]
. Same for sample b
. Let first calculate the norm
n1, n2 = a.size(0), b.size(0) # here both n1 and n2 have the value 2
norm1 = torch.sum(a**2, dim=1)
norm2 = torch.sum(b**2, dim=1)
Now we get
Next, we have norms_1.expand(n_1, n_2)
and norms_2.transpose(0, 1).expand(n_1, n_2)
Note that b
is transposed. The sum of the two gives norm
sample_1.mm(sample_2.t())
, that's the multiplication of the two matrix.
Therefore, after the operation
distances_squared = norms - 2 * sample_1.mm(sample_2.t())
you get
In the end, the last step is taking the square root of every element in the matrix.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With