Skip to content Skip to sidebar Skip to footer

Python Alternative To Itertools Product With Numpy

I am using a list of list with varying sizes. For example alternativesList can include 4 lists in one iteration and 7 lists in the other. What i am trying to do is capture every co

Solution 1:

Generally speaking, if we consider the optimization as a balance scale memory and runtime would be its two Weighing dishes. This is to say that memory optimization and runtime optimization have an indirect relation together (not always but most of the times). Now, regarding your question:

Is there a way to create same object with numpy which works faster than itertools?

Definitely there are, but another point that you need to notice is that abstraction will give you a much more flexibility and that's what itertools.product gives you and Numpy don't. If the scalability is not an important facto in this case you can do this with Numpy and don't give up any benefits. Here is one way using column_stack, repeat and tile functions:

In [5]: np.column_stack((np.repeat(a, b.size),np.tile(b, a.size)))
Out[5]: 
array([['1', 'a'],
       ['1', 'b'],
       ['1', 'c'],
       ['2', 'a'],
       ['2', 'b'],
       ['2', 'c'],
       ['3', 'a'],
       ['3', 'b'],
       ['3', 'c']], dtype='<U21')

Now, still there are some ways to make this array to occupies less memory by using lighter types like U2, U1, etc.

In [10]: np.column_stack((np.repeat(a, b.size),np.tile(b, a.size))).astype('U1')
Out[10]: 
array([['1', 'a'],
       ['1', 'b'],
       ['1', 'c'],
       ['2', 'a'],
       ['2', 'b'],
       ['2', 'c'],
       ['3', 'a'],
       ['3', 'b'],
       ['3', 'c']], dtype='<U1') 

Solution 2:

You can avoid some problems arising from numpy trying to find catchall dtype by explicitly specifying a compound dtype:

Code + some timings:

import numpy as np
import itertools

defcartesian_product_mixed_type(*arrays):
    arrays = *map(np.asanyarray, arrays),
    dtype = np.dtype([(f'f{i}', a.dtype) for i, a inenumerate(arrays)])
    out = np.empty((*map(len, arrays),), dtype)
    idx = slice(None), *itertools.repeat(None, len(arrays) - 1)
    for i, a inenumerate(arrays):
        out[f'f{i}'] = a[idx[:len(arrays) - i]]
    return out.ravel()

a = np.arange(4)
b = np.arange(*map(ord, ('A', 'D')), dtype=np.int32).view('U1')
c = np.arange(2.)

np.set_printoptions(threshold=10)

print(f'a={a}')
print(f'b={b}')
print(f'c={c}')

print('itertools')
print(list(itertools.product(a,b,c)))
print('numpy')
print(cartesian_product_mixed_type(a,b,c))

a = np.arange(100)
b = np.arange(*map(ord, ('A', 'z')), dtype=np.int32).view('U1')
c = np.arange(20.)

import timeit
kwds = dict(globals=globals(), number=1000)

print()
print(f'a={a}')
print(f'b={b}')
print(f'c={c}')

print(f"itertools: {timeit.timeit('list(itertools.product(a,b,c))', **kwds):7.4f} ms")
print(f"numpy:     {timeit.timeit('cartesian_product_mixed_type(a,b,c)', **kwds):7.4f} ms")

a = np.arange(1000)
b = np.arange(1000, dtype=np.int32).view('U1')

print()
print(f'a={a}')
print(f'b={b}')

print(f"itertools: {timeit.timeit('list(itertools.product(a,b))', **kwds):7.4f} ms")
print(f"numpy:     {timeit.timeit('cartesian_product_mixed_type(a,b)', **kwds):7.4f} ms")

Sample output:

a=[0 1 2 3]
b=['A''B''C']
c=[0. 1.]
itertools
[(0, 'A', 0.0), (0, 'A', 1.0), (0, 'B', 0.0), (0, 'B', 1.0), (0, 'C', 0.0), (0, 'C', 1.0), (1, 'A', 0.0), (1, 'A', 1.0), (1, 'B', 0.0), (1, 'B', 1.0), (1, 'C', 0.0), (1, 'C', 1.0), (2, 'A', 0.0), (2, 'A', 1.0), (2, 'B', 0.0), (2, 'B', 1.0), (2, 'C', 0.0), (2, 'C', 1.0), (3, 'A', 0.0), (3, 'A', 1.0), (3, 'B', 0.0), (3, 'B', 1.0), (3, 'C', 0.0), (3, 'C', 1.0)]
numpy
[(0, 'A', 0.) (0, 'A', 1.) (0, 'B', 0.) ... (3, 'B', 1.) (3, 'C', 0.)
 (3, 'C', 1.)]

a=[ 0  1  2 ... 97 98 99]
b=['A''B''C' ... 'w''x''y']
c=[ 0.  1.  2. ... 17. 18. 19.]
itertools:  7.4339 ms
numpy:      1.5701 ms

a=[  0   1   2 ... 997 998 999]
b=['''\x01''\x02' ... 'ϥ''Ϧ''ϧ']
itertools: 62.6357 ms
numpy:      8.0249 ms

Post a Comment for "Python Alternative To Itertools Product With Numpy"