Aggregating
Last updated
Last updated
Pipeline aggregation allows for querying, aggregating, and transforming data in multiple stages in real-time through a single API request. While query arguments allow for making simple, single-stage queries through the API, pipelines allow for more complex searches, aggregations and transformations.
To leverage pipeline aggregation via the API, simply pass a pipeline
query argument with the request. The pipeline query argument is an array of pipeline stages that execute in order. Each stage transforms the result set and passes it on to the next stage in the pipeline.
In the below example, the pipeline contains two stages. The first is a $match
stage. This filters the result set to all accounts create on or after 2015-01-01T00:00:00.000Z
. Then that set of results is passed onto the $group
stage which counts all of the accounts in the result set and finally returns a count
result. For all available stages, see Pipeline Stages
Result sets can be grouped in order to apply accumulators and aggregate data from accessible documents.
The $group stage takes the following form:
For available accumulators, see Accumulator Operators.
The _id
property is your grouping expression and is required. The _id
can be null, it can be a single property or a combination of properties and group expressions. When you specify an _id
of null
, you will group together all of the documents in the result set. This is helpful for calculating accumulated values for all of the documents, for example when calculating a cont of the total number of documents in your result. When the _id
is null, it is not included in the result set.
In the following example, we aggregate a custom c_patient
object with a two-stage pipeline. The first stage is a $match
that filters the documents down to patients with a gender: 'f'
. Those documents are then passed into the $group
stage where we count all of the documents in the result set and return a count
property. Because we pass null
for the grouping _id
, we can calculate an accumulation for the entire result set.
The following example is a single-stage $group
on the connections
object. The grouping is by the connections' context._id
. Then, a count is calculated for each context grouping and returned in total
properties.
The limit stage limits the number of documents passed onto the next stage in the pipeline.
The $match
stage accepts a JSON document that specifies query conditions on indexed properties to filter the resulting document list. This document list is then passed onto the next stage of the pipeline or returned as the response if it is the last stage of the pipeline.
See Query Operators for more details and examples of available conditions.
In this example, we have just a one stage pipeline where we match accounts by the first name in a custom script. This returns a result set of accounts filtered to only include accounts where the last name starts with 's'.
The projection stage takes a JSON document that can specify the inclusion of properties and/or the addition of new properties.
Except for _id
, only properties that you specify for inclusion or add to the projection as new properties will be projected on to the next stage.
The _id
property is always included in the projection, even if you don't specify inclusion.
Existing properties cannot be overridden in a projection. If you wish to transform an existing property in a projection, it is recommended that you instead add a new property with a different name.
In the below example, a custom c_patient
object is aggregated with a two stage pipeline in a script. The first stage is a $match
that finds all patient records with c_gender: 'm'
. That result set is then passed into the $project
stage. The projection then specifies that c_name
, and c_gender
properties are included and that fullName
and age
are added as new properties.
The new fullName
property is a concatenation of the c_first
and c_last
sub-properties if the c_name
document property using the $concat
string operator and $string
literal operator.
The new age
property is calculated from an existing c_birthdate
property, using $subtract
, $divide
, and $floor
arithmetic operators to calculate the age from the difference between c_birthdate
and today's date.
Since the $project
stage is the last stage in the pipeline, the result of the projection is returned as the response.
The skip stage skips over the specified number of documents and passes the remaining documents into the next stage of the pipeline.
The sort stage take all documents it receives and passes them into the next stage in the specified sort order.
Where sort order can be 1
for ascending or -1
for descending.
To optimize performance, it is best to use $sort
at the beginning of the pipeline or near the beginning, after a $match
and $limit
. And $sort
should always proceed $project
, $unwind
, or $group
if it is required -- otherwise your query may be slow or time out.
Unwinding arrays and document arrays expands them into individual documents so that aggregation can take place of the resulting output. For example, mapping the roles array of an account with 3 roles results in 3 output documents, each containing a roles ObjectId property.
Name
Description
Example
$group
A JSON object containing an _id property (required) and 0 or more property groupings.
{"$group": {"_id": null, "count": {"$count": "_id"}}}
$limit
An integer representing the number of results to return.
{"$limit": 100}
$match
A JSON object containing operators and expressions used to limit the search results.
{"$match": {“$and”: [{“created”: {“$gt”: “2015-01-01T00:00:00.000Z”}}, {“created”: {“$lt”: “2015-06-01T00:00:00.000Z”}}]}}
$project
A JSON object that specifies existing properties or new properties to pass along to the next stage in the pipeline.
{"$project",{"_id": 1, "name": 1}}
$skip
An integer representing the number of results to skip. Useful for paging where mapping or groupings have been applied, or where search the search criteria cannot be paged using unique fields.
{"$skip": 10}
$sort
A JSON object containing one or more properties, the values of which must be 1 for ascending and -1 for descending.
{"$sort": {"_id": -1}}
$unwind
Accepts a property name and deconstructs an array into a single property in multiple documents.
{"$unwind": roles}
Specification
Description
property
: 1
or true
Include this property in the projection
property
: expression
Add a new property to the projection