Skip to content Skip to sidebar Skip to footer

Fast Differences Of All Row Pairs With Numpy

I am using an algorithm that requires that each example has a matrix, say Xi which is ai x b, and that for each of the O(n^2) pairs of examples, I find the difference between each

Solution 1:

OK this one was fun. I still can't help thinking it can all be done with a single ingenious call to numpy.tensordot but at any rate this seems to have eliminated all Python-level loops:

import numpy

defslow( a, b=None):

    if b isNone: b = a
    a = numpy.asmatrix( a )
    b = numpy.asmatrix( b )

    out = 0.0for ai in a:
        for bj in b:
            dij = bj - ai
            out += numpy.outer( dij, dij )
    return out

defopsum( a, b=None):

    if b isNone: b = a
    a = numpy.asmatrix( a )
    b = numpy.asmatrix( b )

    RA, CA = a.shape
    RB, CB = b.shape    
    if CA != CB: raise ValueError( "input matrices should have the same number of columns" )

    out = -numpy.outer( a.sum( axis=0 ), b.sum( axis=0 ) );
    out += out.T
    out += RB * ( a.T * a )
    out += RA * ( b.T * b )
    return out

deftest( a, b=None):
    print( "ground truth:" )
    print( slow( a, b ) )
    print( "optimized:" )
    print( opsum( a, b ) )  
    print( "max abs discrepancy:" )
    print( numpy.abs( opsum( a, b ) - slow( a, b ) ).max() )
    print( "" )

# OP example
test( [[1,2], [3,4]] )

# non-symmetric example
a = [ [ 1, 2, 3 ], [-4, 5, 6 ], [7, -8, 9 ], [ 10, 11, -12 ] ]
a = numpy.matrix( a, dtype=float )
b = a[ ::2, ::-1 ] + 15
test( a, b )

# non-integer example
test( numpy.random.randn( *a.shape ), numpy.random.randn( *b.shape ) )

With that (rather arbitrary) example input, timing of opsum (measured using timeit opsum(a,b) in IPython) looks only about a factor of 3–5 better than slow. But of course it scales much better: scale up the numbers of data-points by a factor of 100, and the number of features by a factor of 10, and then we're already looking at about a factor-10,000 increase in speed.

Post a Comment for "Fast Differences Of All Row Pairs With Numpy"