Why Is Numpy Subtraction Slower On One Large Matrix $m$ Than When Dividing $m$ Into Smaller Matrices And Then Subtracting?
Solution 1:
The type of the variable pre_allocated is float8. The input matrices are int. You have an implicit conversion. Try to modify the pre-allocation to:
pre_allocated = np.empty_like(large_matrix)
Before the change, the execution times on my machine were:
0.6756095182868318
1.2262537249271794
1.250292605883855
After the change:
0.6776479894965846
0.6468182835551346
0.6538956945388001
The performance is similar in all cases. There is a large variance in those measurements. One may even observe that the first one is the fastest.
It seams that there is no gain due to pre-allocation.
Note that the allocation is very fast because it reserves only address space. The RAM is consumed only on access event actually. The buffer is 20MiB thus it is larger that L3 caches on the CPU. The execution time will be dominated by page faults and refilling of the caches. Moreover, for the first case the memory is re-allocated just after being freed. The resource is likely to be "hot" for the memory allocator. Therefore you cannot directly compare solution A with others.
Modify the "action" line in the first case to keep the actual result:
np.subtract(list_of_matrices[j], vector, out=pre_allocated[m*j:m*(j+1)])
Then the gain from vectorized operations becomes more observable:
0.8738251849091547
0.678185239557866
0.6830777283598941
Post a Comment for "Why Is Numpy Subtraction Slower On One Large Matrix $m$ Than When Dividing $m$ Into Smaller Matrices And Then Subtracting?"