Skip to content Skip to sidebar Skip to footer

Compute Matrix Of Pairwise Angles Between Two Arrays Of Points

I have two vectors of points, x and y, shaped (n, p) and (m, p) respectively. As an example: x = np.array([[ 0. , -0.16341, 0.98656], [-0.05937, -0.25205, 0.965

Solution 1:

There are multiple ways to do this:

import numpy.linalg as la
from scipy.spatial import distance as dist

# Manuallydefmethod0(x, y):
    dotprod_mat = np.dot(x,  y.T)
    costheta = dotprod_mat / la.norm(x, axis=1)[:, np.newaxis]
    costheta /= la.norm(y, axis=1)
    return np.arccos(costheta)

# Using einsumdefmethod1(x, y):
    dotprod_mat = np.einsum('ij,kj->ik', x, y)
    costheta = dotprod_mat / la.norm(x, axis=1)[:, np.newaxis]
    costheta /= la.norm(y, axis=1)
    return np.arccos(costheta)

# Using scipy.spatial.cdist (one-liner)defmethod2(x, y):
    costheta = 1 - dist.cdist(x, y, 'cosine')
    return np.arccos(costheta)

# Realize that your arrays `x` and `y` are already normalized, meaning you can# optimize method1 even moredefmethod3(x, y):
    costheta = np.einsum('ij,kj->ik', x, y) # Directly gives costheta, since# ||x|| = ||y|| = 1return np.arccos(costheta)

Timing results for (n, m) = (1212, 252):

>>>%timeit theta = method0(x, y)
100 loops, best of 3: 11.1 ms per loop
>>>%timeit theta = method1(x, y)
100 loops, best of 3: 10.8 ms per loop
>>>%timeit theta = method2(x, y)
100 loops, best of 3: 12.3 ms per loop
>>>%timeit theta = method3(x, y)
100 loops, best of 3: 9.42 ms per loop

The difference in timing reduces as the number of elements increases. For (n, m) = (6252, 1212):

>>>%timeit -n10 theta = method0(x, y)
10 loops, best of 3: 365 ms per loop
>>>%timeit -n10 theta = method1(x, y)
10 loops, best of 3: 358 ms per loop
>>>%timeit -n10 theta = method2(x, y)
10 loops, best of 3: 384 ms per loop
>>>%timeit -n10 theta = method3(x, y)
10 loops, best of 3: 314 ms per loop

However, if you leave out the np.arccos step, i.e., suppose you could manage with just costheta, and didn't needtheta itself, then:

>>>%timeit costheta = np.einsum('ij,kj->ik', x, y)
10 loops, best of 3: 61.3 ms per loop
>>>%timeit costheta = 1 - dist.cdist(x, y, 'cosine')
10 loops, best of 3: 124 ms per loop
>>>%timeit costheta = dist.cdist(x, y, 'cosine')
10 loops, best of 3: 112 ms per loop

This is for the case of (6252, 1212). So actually np.arccos is taking up 80% of the time. In this case I find that np.einsum is much faster than dist.cdist. So you definitely want to be using einsum.

Summary: Results for theta are largely similar, but np.einsum is fastest for me, especially when you're not extraneously computing the norms. Try to avoid computing theta and working with just costheta.

Note: An important point I didn't mention is that finiteness of floating-point precision can cause np.arccos to give nan values. method[0:3] worked for values of x and y that hadn't been properly normalized, naturally. But method3 gave a few nans. I fixed this with pre-normalization, which naturally destroys any gain in using method3, unless you need to do this computation many many times for a small set of pre-normalized matrices (for whatever reason).

Post a Comment for "Compute Matrix Of Pairwise Angles Between Two Arrays Of Points"