Python Object Serialization: Having Issue With Pickle Vs Hickle
For couple of days now, I am stuck on my machine learning project. I have a python script that should transform the data for model training by a second script. In the first script
Solution 1:
If you want to dump a huge list of arrays, you might want to look at dask
or klepto
. dask
could break up the list into lists of sub-arrays, while klepto
could break up the list into a dict of sub-arrays (with the key indicating the ordering of the sub-arrays).
>>>import klepto as kl>>>import numpy as np>>>big = np.random.randn(10,100) # could be a huge array>>>ar = kl.archives.dir_archive('foo', dict(enumerate(big)), cached=False)>>>list(ar.keys())
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>>
Then one entry per file is serialized to disk (in output.pkl)
$ ls foo/K_0/
input.pkl output.pkl
Post a Comment for "Python Object Serialization: Having Issue With Pickle Vs Hickle"