Why Is Np.linalg.norm(..., Axis=1) Slower Than Writing Out The Formula For Vector Norms?

June 28, 2023 Post a Comment

To normalize the rows of a matrix X to unit length, I usually use: X /= np.linalg.norm(X, axis=1, keepdims=True) Trying to optimize this operation for an algorithm, I was quite su

Solution 1:

The source code for row-wise L2-norm boils down to the following lines of code:

defnorm(x, keepdims=False):
    x = np.asarray(x)
    s = x**2return np.sqrt(s.sum(axis=(1,), keepdims=keepdims))

The simplified code assumes real-valued x and makes use of the fact that np.add.reduce(s, ...) is equivalent to s.sum(...).

The OP question therefore is the same as asking why np.sum(x,axis=1) is slower than sum(x[:,i] for i in range(x.shape[1])):

%timeit X.sum(axis=1, keepdims=False)
# 131 µs ± 1.6 µs per loop (mean ± std. dev. of7 runs, 10000 loops each)
%timeit sum(X[:,i] for i inrange(X.shape[1]))
# 36.7 µs ± 91.2 ns per loop (mean ± std. dev. of7 runs, 10000 loops each)

This question has been answered already here. In short, the reduction (.sum(axis=1)) comes with overhead costs that generally pay off in terms of floating-point precision and speed (e.g. cache mechanics, parallelism), but don't in the special case of a reduction over just three columns. In this case, the overhead is relatively large compared to the actual computation.

The situation changes if X has more columns. The numpy-boosted normalization now is substantially faster than the reduction using a python for-loop:

X = np.random.randn(10000,100)
%timeit X/np.linalg.norm(X,axis=1, keepdims=True)
# 3.36 ms ± 132 µs per loop (mean ± std. dev. of7 runs, 100 loops each)
%timeit X/np.sqrt(sum(X[:,i]**2for i inrange(X.shape[1])))[:,np.newaxis]
# 5.92 ms ± 168 µs per loop (mean ± std. dev. of7 runs, 100 loops each)

Another related SO thread is found here: numpy ufuncs vs. for loop.

The question remains why common special cases for reduction (such as the summation over the columns or rows of a matrix with low axis dimension) are not treated by numpy explicitly. Maybe it's because the effect of such optimizations often depends strongly on the target machine and increases code complexity considerably.

Getting Started with Python

Why Is Np.linalg.norm(..., Axis=1) Slower Than Writing Out The Formula For Vector Norms?

Solution 1:

Post a Comment for "Why Is Np.linalg.norm(..., Axis=1) Slower Than Writing Out The Formula For Vector Norms?"