Aggregating

Pipeline aggregation allows for querying, aggregating, and transforming data in multiple stages in real-time through a single API request. While query arguments allow for making simple, single-stage queries through the API, pipelines allow for more complex searches, aggregations and transformations.

To leverage pipeline aggregation via the API, simply pass a pipeline query argument with the request. The pipeline query argument is an array of pipeline stages that execute in order. Each stage transforms the result set and passes it on to the next stage in the pipeline.

?pipeline=[{ stage1 },{ stage2 }, { stage3 }, ...]

In the below example, the pipeline contains two stages. The first is a $match stage. This filters the result set to all accounts create on or after 2015-01-01T00:00:00.000Z. Then that set of results is passed onto the $group stage which counts all of the accounts in the result set and finally returns a count result. For all available stages, see Pipeline Stages

GET /accounts?pipeline=[{"$match": {"created": {"$gte": "2015-01-01T00:00:00.000Z"}}}, {"$group": {"_id": null, "count": {"$count": "_id"}}}]

Pipeline Stages

$group

Result sets can be grouped in order to apply accumulators and aggregate data from accessible documents.

The $group stage takes the following form:

{"$group": {"_id": <expression>, <property>: { <accumulator1> : <expression> }, <property>: { <accumulator2> : <expression> },... }}

For available accumulators, see Accumulator Operators.

The _id property is your grouping expression and is required. The _id can be null, it can be a single property or a combination of properties and group expressions. When you specify an _id of null, you will group together all of the documents in the result set. This is helpful for calculating accumulated values for all of the documents, for example when calculating a cont of the total number of documents in your result. When the _id is null, it is not included in the result set.

Examples

In the following example, we aggregate a custom c_patient object with a two-stage pipeline. The first stage is a $match that filters the documents down to patients with a gender: 'f'. Those documents are then passed into the $group stage where we count all of the documents in the result set and return a count property. Because we pass null for the grouping _id, we can calculate an accumulation for the entire result set.

GET /c_patients?pipeline=[{"$match": {"gender": "f"}},{"$group": {"_id": null, "count": {"$count": "_id"}}}]

The following example is a single-stage $group on the connections object. The grouping is by the connections' context._id. Then, a count is calculated for each context grouping and returned in total properties.

GET /connections?pipeline=[{"$group": {"_id": "context._id", "total": {"$count": "_id"}}}, {"$sort": {"total": 1}}]

$limit

The limit stage limits the number of documents passed onto the next stage in the pipeline.

{"$limit": <integer>}

Example

GET /c_patients?pipeline=[{"$match": {"c_gender": "f"}},{"$project": {"c_name": 1}},{"$limit":2}]

$match

The $match stage accepts a JSON document that specifies query conditions on indexed properties to filter the resulting document list. This document list is then passed onto the next stage of the pipeline or returned as the response if it is the last stage of the pipeline.

{"$match": {<query conditions>}}

See Query Operators for more details and examples of available conditions.

Examples

In this example, we have just a one stage pipeline where we match accounts by the first name in a custom script. This returns a result set of accounts filtered to only include accounts where the last name starts with 's'.

GET /accounts?pipeline=[{"$match": {"name.last": {"$regex": "/^s/i"}}}]

$project

The projection stage takes a JSON document that can specify the inclusion of properties and/or the addition of new properties.

{"$project": {<projection spec>}}

Specifications

Considerations

  • Except for _id, only properties that you specify for inclusion or add to the projection as new properties will be projected on to the next stage.

  • The _id property is always included in the projection, even if you don't specify inclusion.

  • Existing properties cannot be overridden in a projection. If you wish to transform an existing property in a projection, it is recommended that you instead add a new property with a different name.

Examples

In the below example, a custom c_patient object is aggregated with a two stage pipeline in a script. The first stage is a $match that finds all patient records with c_gender: 'm'. That result set is then passed into the $project stage. The projection then specifies that c_name, and c_gender properties are included and that fullName and age are added as new properties.

  • The new fullName property is a concatenation of the c_first and c_last sub-properties if the c_name document property using the $concat string operator and $string literal operator.

  • The new age property is calculated from an existing c_birthdate property, using $subtract, $divide, and $floor arithmetic operators to calculate the age from the difference between c_birthdate and today's date.

  • Since the $project stage is the last stage in the pipeline, the result of the projection is returned as the response.

return org.objects.c_patients.aggregate()
  .match({
    c_gender: 'm'
    })
    .project({
    c_name: 1,
    fullName: {$concat: ['c_name.c_first', {$string: ' '}, 'c_name.c_last']},
    c_gender: 1,
    age: { $floor: { $divide: [{ $subtract:[new Date(), 'c_birthdate']}, (365 * 24*60*60*1000)]}}
    })
    .toList();

$skip

The skip stage skips over the specified number of documents and passes the remaining documents into the next stage of the pipeline.

{"$skip": <positive integer>}

Example

GET /c_patients?pipeline=[{"$match": {"c_gender": "f"}},{"$project": {"c_name": 1}},{"$skip": 1},{"$limit":1}]

$sort

The sort stage take all documents it receives and passes them into the next stage in the specified sort order.

{"$sort": {<property1>: <sort order>, <property2>: <sort order>, ...}}

Where sort order can be 1 for ascending or -1 for descending.

Performance Considerations

To optimize performance, it is best to use $sort at the beginning of the pipeline or near the beginning, after a $match and $limit. And $sort should always proceed $project, $unwind, or $group if it is required -- otherwise your query may be slow or time out.

Example

GET /c_patients?pipeline=[{"$match": {"c_gender": "f"}},{"$limit": 2},{"$sort": {"c_name.c_last": 1},{"$project": {"c_name": 1}}}]

$unwind

Unwinding arrays and document arrays expands them into individual documents so that aggregation can take place of the resulting output. For example, mapping the roles array of an account with 3 roles results in 3 output documents, each containing a roles ObjectId property.

{"$unwind": <property>}

Example

GET /accounts/?pipeline=[{"$project": {"roles": 1}}, {"$unwind": "roles"}]

{
    "data": [
        {
            "_id": "55ff9db67a00a89c14aa31a6",
            "object": "account",
            "roles": "000000000000000000000004"
        },
        {
            "_id": "55ff9db67a00a89c14aa31a6",
            "object": "account",
            "roles": "000000000000000000000005"
        },
        {
            "_id": "55ff9db67a00a89c14aa31a6",
            "object": "account",
            "roles": "000000000000000000000007"
        }
    ],
    "hasMore": false,
    "object": "list"
}

Last updated