The MongoDB Aggregation Cheatsheet
Using lookup, group, project and unwind to build powerful queries
What is an aggregation pipeline?
An Aggregation Pipeline is a series of blocks of computation that you apply one by one to set of documents.
Each pipeline stage performs some new computation or manipulation on the documents to which it is passed, and then passes them on to the next stage.
The stages can find, filter, join or manipulate the documents and there is a pipeline operator for just about everything that you want to do.
That being said, I would estimate that 90% of the pipeline-ing that I do consists of $match, $project, $lookup, $unwind and $group.
So, let’s get into it.
Side-note for those who are writing lots of queries and are sick of typing the same things over and over: here is a quick-copy board with some of my most used aggregation operators!
$match: finding and filtering documents
Match does what it says. It passes only the documents that match the query that is used on to the next stage in the pipeline. This works exactly the same as the filter query that you pass to MongoDB’s find() method.
There is a whole range of aggregation expressions that can be used to make your $match more flexible.
$project: re-shaping documents
Project is a way of re-shaping the documents that you have at a particular stage in the pipeline. Projection doesn’t filter or find any documents, so there will be the same number of documents in the pipeline before and after this stage, but the documents will look different.
You can rename/remove fields, or create new calculated fields. This is great for simplifying the documents and making sure that you have only the data that you need.
You are also able to select the fields that you do want, or the field that you don’t want by using 0 or 1 as the projection value. Note, when using 1 to only keep fields, _id will always be kept unless you specify otherwise!
$lookup: fetching from different collections
The ability to look up documents in other collections is one of the most powerful aspects of aggregation.
Let’s say that you have a collection of orders, and you want to see the information about the products relating to each order. Using regular query functions for this is extremely inefficient, because you would need to run a query for every order so as to get its product information. It is far more efficient to get all the information using one aggregation query:
The $lookup adds a new field (products
) containing an array of documents where the specified localField
and foreignField
are matching.
$unwind: breaking out of arrays
The $unwind operator takes an array field and makes a set of identical documents, one for every element in the array.
Using $unwind with $lookup
It is common to see $lookup used in conjunction with the $unwind pipeline operator. $unwind takes an array property on a document turns it into a new document for every element in the array.
Let’s say that you have a collection of products, and you want to find the Suppliers of the products in the collection from the Supplier collection:
// Product documents:
{ _id: 1, supplierId: 1 },
{ _id: 2, supplierId: 4 },
{ _id: 3, supplierId: 2 },
You can use $lookup find the supplier like this:
The problem here is that a lookup returns an array of documents:
// Product documents with $lookup'd suppliers:
{
_id: 1,
supplierId: 1,
supplier: [{
_id: 1, name: 'Alice'
}]
},
{
_id: 2,
supplierId: 4,
supplier: [{
_id: 4, name: 'David'
}]
},
{
_id: 3,
supplierId: 2,
supplier: [{
_id: 2, name: 'Bob'
}]
},
Because we know that _id is a unique field we also know that the array that is created by the lookup will only ever have one element, so we can unwind the documents.
This rolls out the arrays and leaves us with what we wanted 👍:
// Aggregated product documents with suppliers:
{
_id: 1,
supplierId: 1,
supplier: {
_id: 1, name: 'Alice'
}
},
{
_id: 2,
supplierId: 4,
supplier: {
_id: 4, name: 'David'
}
},
{
_id: 3,
supplierId: 2,
supplier: {
_id: 2, name: 'Bob'
}
},
$group: collecting documents into groups
You can use $group to bunch documents together based on a field value that is common to all of the documents. It can also be used for useful things like summing all of the values of a specific field.
// NOTE: If you want to group all documents into the pipeline at the point the group is executed into one document, then you can set the group _id to null
Group only passes along the values that you specify within the stage. In the previous examples the only items on the documents after the group stage would be _id and count in the first, or _id and maxReference in the second.
If you want to keep the other field on the the object you need to decide how the group should deal with them. There are a number of ways to accumulate the fields together:
Conclusion
This is just the start, with the various operators, expressions, accumulators and all the other tools that MongoDB provides you can retrieve your data on your terms.
Happy aggregating!