Skip to content Skip to sidebar Skip to footer

NumPy: 3-byte, 6-byte Types (aka Uint24, Uint48)

NumPy seems to lack built-in support for 3-byte and 6-byte types, aka uint24 and uint48. I have a large data set using these types and want to feed it to numpy. What I currently do

Solution 1:

I don't believe there's a way to do what you're asking (it would require unaligned access, which is highly inefficient on some architectures). My solution from Reading and storing arbitrary byte length integers from a file might be more efficient at transferring the data to an in-process array:

a = np.memmap("filename", mode='r', dtype=np.dtype('>u1'))
e = np.zeros(a.size / 6, np.dtype('>u8'))
for i in range(3):
    e.view(dtype='>u2')[i + 1::4] = a.view(dtype='>u2')[i::3]

You can get unaligned access using the strides constructor parameter:

e = np.ndarray((a.size - 2) // 6, np.dtype('<u8'), buf, strides=(6,))

However with this each element will overlap with the next, so to actually use it you'd have to mask out the high bytes on access.


Solution 2:

There's an answer for this over at: How do I create a Numpy dtype that includes 24 bit integers?

It's a bit ugly, but does exactly what you want: Allows you to index your ndarray like it's got a dtype of <u3 so you can memmap() big data from disk.
You still need to manually apply a bitmask to clear out the fourth overlapping byte, but that can be applied to the sliced (multidimensional) array after access.

The trick is to abuse the 'stride' part of an ndarray, so that indexing works. In order to make it work without it complaining about limits, there's a special trick.


Solution 3:

Using the code below you can read integers of any size coded as big or little endian:

def readBigEndian(filename, bytesize):
    with (open(filename,"rb")) as f:
         str = f.read(bytesize)
         while len(str)==bytesize:
             int = 0;
             for byte in map(ord,str):
                 print byte
                 int = (int << 8) | byte
             yield(int)
             str = f.read(bytesize)

def readLittleEndian(filename, bytesize):
    with (open(filename,"rb")) as f:
         str = f.read(bytesize)
         while len(str)==bytesize:
             int = 0;
             shift = 0
             for byte in map(ord,str):
                 print byte
                 int |= byte << shift
                 shift += 8
             yield(int)
             str = f.read(bytesize)

for i in readLittleEndian("readint.py",3):
    print i

Post a Comment for "NumPy: 3-byte, 6-byte Types (aka Uint24, Uint48)"