Skip to content Skip to sidebar Skip to footer

ResourceExhaustedError :OOM When Allocating Tensor With Shape []

def RNN(X, weights, biases): X = tf.reshape(X, [-1, n_inputs]) X_in = tf.matmul(X, weights['in']) + biases['in'] X_in = tf.reshape(X_in, [-1, n_steps, n_hidden_units])

Solution 1:

The problem was caused by this line in the training loop:

while s + batch_size < ran:
    # ...
    batch_xs1 = tf.nn.embedding_lookup(embedding_matrix, batch_id)

Calling the tf.nn.embedding_lookup() function adds nodes to the TensorFlow graph, and—because these are never garbage collected—doing so in a loop causes a memory leak.

The actual cause of the memory leak is probably the embedding_matrix NumPy array in the argument to tf.nn.embedding_lookup(). TensorFlow tries to be helpful and convert all NumPy arrays in the arguments to a function into tf.constant() nodes in the TensorFlow graph. However, in a loop, this will end up with multiple separate copies of the embedding_matrix copied into TensorFlow and then onto scarce GPU memory.

The simplest solution is to move the tf.nn.embedding_lookup() call outside the training loop. For example:

def while_loop(s,e,step):
  batch_id_placeholder = tf.placeholder(tf.int32)
  batch_xs1 = tf.nn.embedding_lookup(embedding_matrix, batch_id_placeholder)

  while s+batch_size<ran:

    batch_label = csc_matrix((data, (batch_row, batch_col)), shape=(batch_size, n_classes))
    batch_label = batch_label.toarray(), feed_dict={batch_id_placeholder: batch_id})

Solution 2:

I recently had this problem with TF + Keras and previously with Darknet with yolo v3. My dataset contained very large images for the memory of my two GTX 1050s. I had to resize the images to be smaller. On average, a 1024x1024 image needs 6GB per GPU.

Post a Comment for "ResourceExhaustedError :OOM When Allocating Tensor With Shape []"