Skip to content Skip to sidebar Skip to footer

Converting A List Of Unequally Shaped Arrays To Tensorflow 2 Dataset: Valueerror: Can't Convert Non-rectangular Python Sequence To Tensor

I have tokenized data in the form of a list of unequally shaped arrays: array([array([1179, 6, 208, 2, 1625, 92, 9, 3870, 3, 2136, 435, 5, 2453, 2180, 4

Solution 1:

If your data is stored in Numpy arrays or Python lists, then you can use tf.data.Dataset.from_generator method to create the dataset and then pad the batches:

train_batches = tf.data.Dataset.from_generator(
    lambda: iter(zip(x, y)), 
    output_types=(tf.int64, tf.int64)
).padded_batch(
    batch_size=32,
    padded_shapes=([None], ())
)

However, if you are using tensorflow_datasets.load function, then there is no need to use as_numpy_iterator to separate the data and the labels, and then put them back together in a dataset! That's redundant and inefficient. The objects returned by tensorflow_datasets.load are already an instance of tf.data.Dataset. So, you just need to use padded_batch on them:

train_batches = train_data.padded_batch(batch_size=32, padded_shapes=([None], []))
test_batches = test_data.padded_batch(batch_size=32, padded_shapes=([None], []))

Note that in TensorFlow 2.2 and above, you no longer need to provide the padded_shapes argument if you just want all the axes to be padded to the longest of the batch (i.e. default behavior).

Post a Comment for "Converting A List Of Unequally Shaped Arrays To Tensorflow 2 Dataset: Valueerror: Can't Convert Non-rectangular Python Sequence To Tensor"