Skip to content Skip to sidebar Skip to footer

How Do I Get A List Of Just The Objectid's Using Pymongo?

I have the following code: client = MongoClient() data_base = client.hkpr_restore agents_collection = data_base.agents agent_ids = agents_collection.find({},{'_id':1}) This gives

Solution 1:

Use distinct

In [27]: agent_ids = agents_collection.find().distinct('_id')

In [28]: agent_ids
Out[28]: 
[ObjectId('553662940acf450bef638e6d'),
 ObjectId('553662940acf450bef638e6e'),
 ObjectId('553662940acf450bef638e6f')]

In [29]: agent_id2 = [str(id) for id in agents_collection.find().distinct('_id')]

In [30]: agent_id2
Out[30]: 
['553662940acf450bef638e6d',
 '553662940acf450bef638e6e',
 '553662940acf450bef638e6f']

Solution 2:

I solved the problem by following this answer. Adding hint to the find syntax then simply iterate through the cursor returned.

db.c.find({},{_id:1}).hint(_id:1);

I am guessing without the hint the cursor would get the whole documentation back when iterated, causing the iteration to be extremely slow. With hint, the cursor would only return ObjectId back and the iteration would finish very quickly.

The background is I am working on an ETL job that require sync one mongo collection to another while modify the data by some criteria. The total number of Object id is around 100000000.

I tried using distinct but got the following error:

Errorin : distinct too big, 16mb cap

I tried using aggregation and did $group as answered from other similar question. Only to hit some memory consumption error.

Solution 3:

Try creating a list comprehension with just the _ids as follows:

>>>client = MongoClient()>>>data_base = client.hkpr_restore>>>agents_collection = data_base.agents>>>result = agents_collection.find({},{"_id":1})>>>agent_ids = [x["_id"] for x in result]>>>>>>print agent_ids
[ ObjectId('553020a8bf2e4e7a438b46d9'),  ObjectId('553020a8bf2e4e7a438b46da'),  ObjectId('553020a8bf2e4e7a438b46db')]
>>>

Solution 4:

I would like to add something which is more general than querying for all _id.

import bson
[...]
results = agents_collection.find({}})
objects = [v for result in results for k,v in result.items()
          ifisinstance(v,bson.objectid.ObjectId)]

Context: saving objects in gridfs creates ObjectIds, to retrieve all of them for further querying, this function helped me out.

Post a Comment for "How Do I Get A List Of Just The Objectid's Using Pymongo?"