Skip to content Skip to sidebar Skip to footer

Python: Can I Write To A File Without Loading Its Contents In Ram?

Got a big data-set that I want to shuffle. The entire set won't fit into RAM so it would be good if I could open several files (e.g. hdf5, numpy) simultaneously, loop through my da

Solution 1:

Memmory mapped files will allow for what you want. They create a numpy array which leaves the data on disk, only loading data as needed. The complete manual page is here. However, the easiest way to use them is by passing the argument mmap_mode=r+ or mmap_mode=w+ in the call to np.load leaves the file on disk (see here).

I'd suggest using advanced indexing. If you have data in a one dimensional array arr, you can index it using a list. So arr[ [0,3,5]] will give you the 0th, 3rd, and 5th elements of arr. That will make selecting the shuffled versions much easier. Since this will overwrite the data you'll need to open the files on disk read only, and create copies (using mmap_mode=w+) to put the shuffled data in.

Post a Comment for "Python: Can I Write To A File Without Loading Its Contents In Ram?"