Skip to content Skip to sidebar Skip to footer

Failing To Write In Hdf5 File

I am trying to create hdf5 file, but the output file is empty. I have written a python code which is supposed to run in loop and write string in the created datasets. After the fi

Solution 1:

Not sure how this can be solved using h5py but if you are not bound to a specific library, take a look at HDFql as it is really easy to handle HDF5 files with it.

Using HDFql in Python, your use-case can be solved with the help of hyperslabs as follows:

import HDFql

HDFql.execute("CREATE AND USE FILE sample.h5")

HDFql.execute("CREATE CHUNKED(1) DATASET objects/D1 AS VARCHAR(10, 2)")

HDFql.execute("CREATE CHUNKED(1) DATASET objects/D2 AS VARCHAR(10, 3)")

for i in range(10):

    HDFql.execute("INSERT INTO objects/D1(%d:::1) VALUES(Sample, %d)" % (i, i))

    HDFql.execute("INSERT INTO objects/D2(%d:::1) VALUES(Hello, World, %d)" % (i, i))

HDFql.execute("CLOSE FILE")

Additional examples on how to use HDFql can be found here.

Solution 2:

Your code works for me (in an ipython session):

In [1]: import h5py                                                                                    
In [2]: h5_file_name = 'sample.h5' 
   ...: hf = h5py.File(h5_file_name, 'w') 
   ...: g1 = hf.create_group('Objects') 
   ...: dt = h5py.special_dtype(vlen=str) 
   ...: d1 = g1.create_dataset('D1', (2, 10), dtype=dt) 
   ...: d2 = g1.create_dataset('D2', (3, 10), dtype=dt) 
   ...: for i in range(10): 
   ...:     d1[0][i] = 'Sample' 
   ...:     d1[1][i] = str(i) 
   ...:     d2[0][i] = 'Hello' 
   ...:     d2[1][i] = 'World' 
   ...:     d2[2][i] = str(i) 
   ...: hf.close()   

This runs, and creates a file. It is not "empty" in the normal sense. But if by file being empty you mean that it didn't write the words to the file? All that's present is the original ''.

In [4]: hf = h5py.File(h5_file_name, 'r')                                                              
In [5]: hf['Objects/D1']                                                                               
Out[5]: <HDF5 dataset "D1": shape (2, 10), type "|O">
In [6]: hf['Objects/D1'][:]                                                                            
Out[6]: 
array([['', '', '', '', '', '', '', '', '', ''],
       ['', '', '', '', '', '', '', '', '', '']], dtype=object)

===

The problem isn't with the file setup, but rather with how you are trying to set elements:

In [45]: h5_file_name = 'sample.h5' 
    ...: hf = h5py.File(h5_file_name, 'w') 
    ...: g1 = hf.create_group('Objects') 
    ...: dt = h5py.special_dtype(vlen=str) 
    ...: d1 = g1.create_dataset('D1', (2, 10), dtype=dt) 
    ...: d2 = g1.create_dataset('D2', (3, 10), dtype=dt) 
    ...:                                                                                               
In [46]: d1[:]                                                                                         
Out[46]: 
array([['', '', '', '', '', '', '', '', '', ''],
       ['', '', '', '', '', '', '', '', '', '']], dtype=object)
In [47]: d1[0][0] = 'sample'                                                                           
In [48]: d1[:]                                                                                         
Out[48]: 
array([['', '', '', '', '', '', '', '', '', ''],
       ['', '', '', '', '', '', '', '', '', '']], dtype=object)

Use the tuple style of indexing:

In [49]: d1[0, 0] = 'sample'                                                                           
In [50]: d1[:]                                                                                         
Out[50]: 
array([['sample', '', '', '', '', '', '', '', '', ''],
       ['', '', '', '', '', '', '', '', '', '']], dtype=object)

With a numpy array d1[0][0]=... works, but that's because d1[0] is a view of d1, but h5py (apparently) does not quite replicate this. d1[0] is a copy, an actual numpy array, not the dataset itself.

Variations on that whole-array indexing:

In [51]: d1[0, :] = 'sample'                                                                           
In [52]: d1[1, :] = np.arange(10)                                                                      
In [53]: d1[:]                                                                                         
Out[53]: 
array([['sample', 'sample', 'sample', 'sample', 'sample', 'sample',
        'sample', 'sample', 'sample', 'sample'],
       ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']], dtype=object)
In [54]: d2[:,0] = ['one','two','three']                                                               
In [55]: d2[:]                                                                                         
Out[55]: 
array([['one', '', '', '', '', '', '', '', '', ''],
       ['two', '', '', '', '', '', '', '', '', ''],
       ['three', '', '', '', '', '', '', '', '', '']], dtype=object)

Verifying the change in type with indexing:

In[64]: type(d1)                                                                                      
Out[64]: h5py._hl.dataset.DatasetIn[65]: type(d1[0])                                                                                   
Out[65]: numpy.ndarray

d1[0][0]='foobar' would change that d1[0] array without affecting the d1 dataset.

Post a Comment for "Failing To Write In Hdf5 File"