Skip to content Skip to sidebar Skip to footer

Python Object Serialization: Having Issue With Pickle Vs Hickle

For couple of days now, I am stuck on my machine learning project. I have a python script that should transform the data for model training by a second script. In the first script

Solution 1:

If you want to dump a huge list of arrays, you might want to look at dask or klepto. dask could break up the list into lists of sub-arrays, while klepto could break up the list into a dict of sub-arrays (with the key indicating the ordering of the sub-arrays).

>>>import klepto as kl>>>import numpy as np>>>big = np.random.randn(10,100)  # could be a huge array>>>ar = kl.archives.dir_archive('foo', dict(enumerate(big)), cached=False)>>>list(ar.keys())
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>>

Then one entry per file is serialized to disk (in output.pkl)

$ ls foo/K_0/
input.pkl   output.pkl

Post a Comment for "Python Object Serialization: Having Issue With Pickle Vs Hickle"