Aggregating
Pipeline aggregation allows for querying, aggregating, and transforming data in multiple stages in real-time through a single API request. While query arguments allow for making simple, single-stage queries through the API, pipelines allow for more complex searches, aggregations and transformations.
To leverage pipeline aggregation via the API, simply pass a pipeline
query argument with the request. The pipeline query argument is an array of pipeline stages that execute in order. Each stage transforms the result set and passes it on to the next stage in the pipeline.
In the below example, the pipeline contains two stages. The first is a $match
stage. This filters the result set to all accounts create on or after 2015-01-01T00:00:00.000Z
. Then that set of results is passed onto the $group
stage which counts all of the accounts in the result set and finally returns a count
result. For all available stages, see Pipeline Stages
Pipeline Stages
$group
Result sets can be grouped in order to apply accumulators and aggregate data from accessible documents.
The $group stage takes the following form:
For available accumulators, see Accumulator Operators.
The _id
property is your grouping expression and is required. The _id
can be null, it can be a single property or a combination of properties and group expressions. When you specify an _id
of null
, you will group together all of the documents in the result set. This is helpful for calculating accumulated values for all of the documents, for example when calculating a cont of the total number of documents in your result. When the _id
is null, it is not included in the result set.
Examples
In the following example, we aggregate a custom c_patient
object with a two-stage pipeline. The first stage is a $match
that filters the documents down to patients with a gender: 'f'
. Those documents are then passed into the $group
stage where we count all of the documents in the result set and return a count
property. Because we pass null
for the grouping _id
, we can calculate an accumulation for the entire result set.
The following example is a single-stage $group
on the connections
object. The grouping is by the connections' context._id
. Then, a count is calculated for each context grouping and returned in total
properties.
$limit
The limit stage limits the number of documents passed onto the next stage in the pipeline.
Example
$match
The $match
stage accepts a JSON document that specifies query conditions on indexed properties to filter the resulting document list. This document list is then passed onto the next stage of the pipeline or returned as the response if it is the last stage of the pipeline.
See Query Operators for more details and examples of available conditions.
Examples
In this example, we have just a one stage pipeline where we match accounts by the first name in a custom script. This returns a result set of accounts filtered to only include accounts where the last name starts with 's'.
$project
The projection stage takes a JSON document that can specify the inclusion of properties and/or the addition of new properties.
Specifications
Considerations
Except for
_id
, only properties that you specify for inclusion or add to the projection as new properties will be projected on to the next stage.The
_id
property is always included in the projection, even if you don't specify inclusion.Existing properties cannot be overridden in a projection. If you wish to transform an existing property in a projection, it is recommended that you instead add a new property with a different name.
Examples
In the below example, a custom c_patient
object is aggregated with a two stage pipeline in a script. The first stage is a $match
that finds all patient records with c_gender: 'm'
. That result set is then passed into the $project
stage. The projection then specifies that c_name
, and c_gender
properties are included and that fullName
and age
are added as new properties.
The new
fullName
property is a concatenation of thec_first
andc_last
sub-properties if thec_name
document property using the$concat
string operator and$string
literal operator.The new
age
property is calculated from an existingc_birthdate
property, using$subtract
,$divide
, and$floor
arithmetic operators to calculate the age from the difference betweenc_birthdate
and today's date.Since the
$project
stage is the last stage in the pipeline, the result of the projection is returned as the response.
$skip
The skip stage skips over the specified number of documents and passes the remaining documents into the next stage of the pipeline.
Example
$sort
The sort stage take all documents it receives and passes them into the next stage in the specified sort order.
Where sort order can be 1
for ascending or -1
for descending.
Performance Considerations
To optimize performance, it is best to use $sort
at the beginning of the pipeline or near the beginning, after a $match
and $limit
. And $sort
should always proceed $project
, $unwind
, or $group
if it is required -- otherwise your query may be slow or time out.
Example
$unwind
Unwinding arrays and document arrays expands them into individual documents so that aggregation can take place of the resulting output. For example, mapping the roles array of an account with 3 roles results in 3 output documents, each containing a roles ObjectId property.
Example
Last updated