How Is Memory Garbage Collected In App Engine (python) When Iterating Over Db Results
Solution 1:
It looks like your batch solution is conflicting with db's batching, resulting in a lot of extra batches hanging around.
When you run query.run(batch_size=batch_size)
, db will run the query until completion of the entire limit. When you reach the end of the batch, db will grab the next batch. However, right after db does this, you exit the loop and start again. What this means is that batches 1 -> n will all exist in memory twice. Once for the last queries fetch, once for your next queries fetch.
If you want to loop over all your entities, just let db handle the batching:
foos = models.Foo.all().filter('status =', 6)
for foo in foos.run(batch_size = batch_size):
results +=1
bar = some_module.get_bar(foo)
if bar:
try:
dict_of_results[bar.baz] += 1except KeyError:
dict_of_results[bar.baz] = 1
Or, if you want to handle batching yourself, make sure db doesn't do any batching:
while True:
foo_query = models.Foo.all().filter('status =', 6)
if cursor:
foo_query.with_cursor(cursor)
foos = foo_query.fetch(limit=batch_size)
if not foos:
break
cursor = foos.cursor()
Solution 2:
You might be looking in the wrong direction.
Take a look at this Q&A for approaches to check on garbage collection and for potential alternate explanations: Google App Engine DB Query Memory Usage
Post a Comment for "How Is Memory Garbage Collected In App Engine (python) When Iterating Over Db Results"