Aggregating
Pipeline aggregation allows for querying, aggregating, and transforming data in multiple stages in real-time through a single API request. While query arguments allow for making simple, single-stage queries through the API, pipelines allow for more complex searches, aggregations and transformations.
To leverage pipeline aggregation via the API, simply pass a pipeline query argument with the request. The pipeline query argument is an array of pipeline stages that execute in order. Each stage transforms the result set and passes it on to the next stage in the pipeline.
?pipeline=[{ stage1 },{ stage2 }, { stage3 }, ...]In the below example, the pipeline contains two stages. The first is a $match stage. This filters the result set to all accounts create on or after 2015-01-01T00:00:00.000Z. Then that set of results is passed onto the $group stage which counts all of the accounts in the result set and finally returns a count result. For all available stages, see Pipeline Stages
GET /accounts?pipeline=[{"$match": {"created": {"$gte": "2015-01-01T00:00:00.000Z"}}}, {"$group": {"_id": null, "count": {"$count": "_id"}}}]{
"data": [
{
"count": 24
}
],
"hasMore": false,
"object": "list"
}Pipeline Stages
Name
Description
Example
$group
A JSON object containing an _id property (required) and 0 or more property groupings.
{"$group": {"_id": null, "count": {"$count": "_id"}}}
$limit
An integer representing the number of results to return.
{"$limit": 100}
$match
A JSON object containing operators and expressions used to limit the search results.
{"$match": {“$and”: [{“created”: {“$gt”: “2015-01-01T00:00:00.000Z”}}, {“created”: {“$lt”: “2015-06-01T00:00:00.000Z”}}]}}
$project
A JSON object that specifies existing properties or new properties to pass along to the next stage in the pipeline.
{"$project",{"_id": 1, "name": 1}}
$skip
An integer representing the number of results to skip. Useful for paging where mapping or groupings have been applied, or where search the search criteria cannot be paged using unique fields.
{"$skip": 10}
$sort
A JSON object containing one or more properties, the values of which must be 1 for ascending and -1 for descending.
{"$sort": {"_id": -1}}
$unwind
Accepts a property name and deconstructs an array into a single property in multiple documents.
{"$unwind": roles}
$group
Result sets can be grouped in order to apply accumulators and aggregate data from accessible documents.
The $group stage takes the following form:
{"$group": {"_id": <expression>, <property>: { <accumulator1> : <expression> }, <property>: { <accumulator2> : <expression> },... }}For available accumulators, see Accumulator Operators.
The _id property is your grouping expression and is required. The _id can be null, it can be a single property or a combination of properties and group expressions. When you specify an _id of null, you will group together all of the documents in the result set. This is helpful for calculating accumulated values for all of the documents, for example when calculating a cont of the total number of documents in your result. When the _id is null, it is not included in the result set.
Examples
In the following example, we aggregate a custom c_patient object with a two-stage pipeline. The first stage is a $match that filters the documents down to patients with a gender: 'f'. Those documents are then passed into the $group stage where we count all of the documents in the result set and return a count property. Because we pass null for the grouping _id, we can calculate an accumulation for the entire result set.
GET /c_patients?pipeline=[{"$match": {"gender": "f"}},{"$group": {"_id": null, "count": {"$count": "_id"}}}]{
"data": [
{
"count": 359
}
],
"hasMore": false,
"object": "list"
}The following example is a single-stage $group on the connections object. The grouping is by the connections' context._id. Then, a count is calculated for each context grouping and returned in total properties.
GET /connections?pipeline=[{"$group": {"_id": "context._id", "total": {"$count": "_id"}}}, {"$sort": {"total": 1}}]{
"data": [
{
"_id": "507f1f77bcf86cd799439011",
"total": 2
},
{
"_id": "55ff9db67a00a89c14aa31a6",
"total": 4
},
{
"_id": "4d656461626c6552756c656b",
"total": 12
},
{
"_id": "507f191e810c19729de860ea",
"total": 52
}
],
"hasMore": false,
"object": "list"
}$limit
The limit stage limits the number of documents passed onto the next stage in the pipeline.
{"$limit": <integer>}Example
GET /c_patients?pipeline=[{"$match": {"c_gender": "f"}},{"$project": {"c_name": 1}},{"$limit":2}]{
"data": [
{
"_id": "59495fc1514ef1010000598d",
"c_name": {
"c_first": "Jane",
"c_last": "Smith"
}
},
{
"_id": "59496035bfd25d0100661faf",
"c_name": {
"c_first": "Holly",
"c_last": "Forrester"
}
}
],
"hasMore": false,
"object": "list"
}$match
The $match stage accepts a JSON document that specifies query conditions on indexed properties to filter the resulting document list. This document list is then passed onto the next stage of the pipeline or returned as the response if it is the last stage of the pipeline.
{"$match": {<query conditions>}}See Query Operators for more details and examples of available conditions.
Examples
In this example, we have just a one stage pipeline where we match accounts by the first name in a custom script. This returns a result set of accounts filtered to only include accounts where the last name starts with 's'.
GET /accounts?pipeline=[{"$match": {"name.last": {"$regex": "/^s/i"}}}]{
"data": [
{
"_id": "5888b91d35f8c91536c6ddea",
"access": 6,
"c_enrollments": [],
"c_study_groups": [],
"created": "2017-01-25T14:41:33.297Z",
"email": "[email protected]",
"favorite": false,
"gender": "m",
"inherited_roles": [
"000000000000000000000007",
"000000000000000000000006"
],
"key": {
"fingerprint": "8e53d430-e30c-11e6-aaeb-99172490c13b",
"secret": "heBcf2Wgh6hjezADTAc88wHJNGcJgwUd"
},
"locale": "en_US",
"locked": false,
"mobile": "+12223334449",
"name": {
"first": "John",
"last": "Silver"
},
"object": "account",
"roles": [
"000000000000000000000004"
],
"shared": false,
"state": "verified",
"updated": "2017-01-25T14:42:54.970Z",
"updater": {
"_id": "5888b91d35f8c91536c6ddea",
"object": "account",
"path": "/accounts/5888b91d35f8c91536c6ddea"
}
}
],
"hasMore": false,
"object": "list"
}$project
The projection stage takes a JSON document that can specify the inclusion of properties and/or the addition of new properties.
{"$project": {<projection spec>}}Specifications
Specification
Description
property: 1 or true
Include this property in the projection
property: expression
Add a new property to the projection
Considerations
Except for
_id, only properties that you specify for inclusion or add to the projection as new properties will be projected on to the next stage.The
_idproperty is always included in the projection, even if you don't specify inclusion.Existing properties cannot be overridden in a projection. If you wish to transform an existing property in a projection, it is recommended that you instead add a new property with a different name.
Examples
In the below example, a custom c_patient object is aggregated with a two stage pipeline in a script. The first stage is a $match that finds all patient records with c_gender: 'm'. That result set is then passed into the $project stage. The projection then specifies that c_name, and c_gender properties are included and that fullName and age are added as new properties.
The new
fullNameproperty is a concatenation of thec_firstandc_lastsub-properties if thec_namedocument property using the$concatstring operator and$stringliteral operator.The new
ageproperty is calculated from an existingc_birthdateproperty, using$subtract,$divide, and$floorarithmetic operators to calculate the age from the difference betweenc_birthdateand today's date.Since the
$projectstage is the last stage in the pipeline, the result of the projection is returned as the response.
return org.objects.c_patients.aggregate()
.match({
c_gender: 'm'
})
.project({
c_name: 1,
fullName: {$concat: ['c_name.c_first', {$string: ' '}, 'c_name.c_last']},
c_gender: 1,
age: { $floor: { $divide: [{ $subtract:[new Date(), 'c_birthdate']}, (365 * 24*60*60*1000)]}}
})
.toList();{
"data": [
{
"_id": "59495fb2514ef1010000598b",
"age": 38,
"c_gender": "m",
"c_name": {
"c_first": "John",
"c_last": "Smith"
},
"fullName": "John Smith"
},
{
"_id": "59495fd7514ef1010000598f",
"age": 48,
"c_gender": "m",
"c_name": {
"c_first": "Charles",
"c_last": "Jones"
},
"fullName": "Charles Jones"
},
{
"_id": "59496012cfc40501006c10d2",
"age": 43,
"c_gender": "m",
"c_name": {
"c_first": "Richard",
"c_last": "Roberts"
},
"fullName": "Richard Roberts"
}
],
"hasMore": false,
"object": "list"
}$skip
The skip stage skips over the specified number of documents and passes the remaining documents into the next stage of the pipeline.
{"$skip": <positive integer>}Example
GET /c_patients?pipeline=[{"$match": {"c_gender": "f"}},{"$project": {"c_name": 1}},{"$skip": 1},{"$limit":1}]{
"data": [
{
"_id": "59496035bfd25d0100661faf",
"c_name": {
"c_first": "Holly",
"c_last": "Forrester"
}
}
],
"hasMore": false,
"object": "list"
}$sort
The sort stage take all documents it receives and passes them into the next stage in the specified sort order.
{"$sort": {<property1>: <sort order>, <property2>: <sort order>, ...}}Where sort order can be 1 for ascending or -1 for descending.
Performance Considerations
To optimize performance, it is best to use $sort at the beginning of the pipeline or near the beginning, after a $match and $limit. And $sort should always proceed $project, $unwind, or $group if it is required -- otherwise your query may be slow or time out.
Example
GET /c_patients?pipeline=[{"$match": {"c_gender": "f"}},{"$limit": 2},{"$sort": {"c_name.c_last": 1},{"$project": {"c_name": 1}}}]{
"data": [
{
"_id": "59496035bfd25d0100661faf",
"c_name": {
"c_first": "Holly",
"c_last": "Forrester"
}
},
{
"_id": "59495fc1514ef1010000598d",
"c_name": {
"c_first": "Jane",
"c_last": "Smith"
}
}
],
"hasMore": false,
"object": "list"
}$unwind
Unwinding arrays and document arrays expands them into individual documents so that aggregation can take place of the resulting output. For example, mapping the roles array of an account with 3 roles results in 3 output documents, each containing a roles ObjectId property.
{"$unwind": <property>}Example
GET /accounts/?pipeline=[{"$project": {"roles": 1}}, {"$unwind": "roles"}]
{
"data": [
{
"_id": "55ff9db67a00a89c14aa31a6",
"object": "account",
"roles": "000000000000000000000004"
},
{
"_id": "55ff9db67a00a89c14aa31a6",
"object": "account",
"roles": "000000000000000000000005"
},
{
"_id": "55ff9db67a00a89c14aa31a6",
"object": "account",
"roles": "000000000000000000000007"
}
],
"hasMore": false,
"object": "list"
}Last updated
Was this helpful?