Skip to content Skip to sidebar Skip to footer

How To Pipe Binary Data Into Numpy Arrays Without Tmp Storage?

There are several similar questions but none of them answers this simple question directly: How can i catch a commands output and stream that content into numpy arrays without crea

Solution 1:

You can use Popen with stdout=subprocess.PIPE. Read in the header, then load the rest into a bytearray to use with np.frombuffer.

Additional comments based on your edit:

If you're going to call proc.stdout.read(), it's equivalent to using check_output(). Both create a temporary string. If you preallocate data, you could use proc.stdout.readinto(data). Then if the number of bytes read into data is less than len(data), free the excess memory, else extend data by whatever is left to be read.

data = bytearray(2**32) # 4 GiB
n = proc.stdout.readinto(data)
if n < len(data):
    data[n:] = ''else:
    data += proc.stdout.read()

You could also come at this starting with a pre-allocated ndarrayndata and use buf = np.getbuffer(ndata). Then readinto(buf) as above.

Here's an example to show that the memory is shared between the bytearray and the np.ndarray:

>>>data = bytearray('\x01')>>>ndata = np.frombuffer(data, np.int8)>>>ndata
array([1], dtype=int8)
>>>ndata[0] = 2>>>data
bytearray(b'\x02')

Solution 2:

Since your data can easily fit in RAM, I think the easiest way to load the data into a numpy array is to use a ramfs.

On Linux,

sudo mkdir /mnt/ramfs
sudo mount -t ramfs -o size=5G ramfs /mnt/ramfs
sudo chmod 777 /mnt/ramfs

Then, for example, if this is the producer of the binary data:

writer.py:

from __future__ import print_function
import random
importstruct
N = random.randrange(100)
print('a b')
for i in range(2*N):
    print(struct.pack('<d',random.random()), end = '')

Then you could load it into a numpy array like this:

reader.py:

import subprocess
import numpy

defparse_header(f):
    # this function moves the filepointer and returns a dictionary
    header = f.readline()
    d = dict.fromkeys(header.split())
    return d

filename = '/mnt/ramfs/data.out'withopen(filename, 'w') as f:  
    cmd = 'writer.py'
    proc = subprocess.Popen([cmd], stdout = f)
    proc.communicate()
withopen(filename, 'r') as f:      
    header = parse_header(f)
    dt = numpy.dtype([(key, 'f8') for key in header.keys()])
    data = numpy.fromfile(f, dt)

Post a Comment for "How To Pipe Binary Data Into Numpy Arrays Without Tmp Storage?"