# Aggregating

Pipeline aggregation allows for querying, aggregating, and transforming data in multiple stages in real-time through a single API request. While query arguments allow for making simple, single-stage queries through the API, pipelines allow for more complex searches, aggregations and transformations.

To leverage pipeline aggregation via the API, simply pass a `pipeline` query argument with the request. The pipeline query argument is an array of pipeline stages that execute in order. Each stage transforms the result set and passes it on to the next stage in the pipeline.

```
?pipeline=[{ stage1 },{ stage2 }, { stage3 }, ...]
```

In the below example, the pipeline contains two stages. The first is a `$match` stage. This filters the result set to all accounts create on or after `2015-01-01T00:00:00.000Z`. Then that set of results is passed onto the `$group` stage which counts all of the accounts in the result set and finally returns a `count` result. For all available stages, see Pipeline Stages

{% tabs %}
{% tab title="Pipeline Request" %}

```javascript
GET /accounts?pipeline=[{"$match": {"created": {"$gte": "2015-01-01T00:00:00.000Z"}}}, {"$group": {"_id": null, "count": {"$count": "_id"}}}]
```

{% endtab %}

{% tab title="Pipeline Result" %}

```javascript
{
    "data": [
        {
            "count": 24
        }
    ],
    "hasMore": false,
    "object": "list"
}
```

{% endtab %}
{% endtabs %}

## Pipeline Stages

| Name     | Description                                                                                                                                                                                     | Example                                                                                                                     |
| -------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------- |
| $group   | A JSON object containing an \_id property (required) and 0 or more property groupings.                                                                                                          | {"$group": {"\_id": null, "count": {"$count": "\_id"}}}                                                                     |
| $limit   | An integer representing the number of results to return.                                                                                                                                        | {"$limit": 100}                                                                                                             |
| $match   | A JSON object containing operators and expressions used to limit the search results.                                                                                                            | {"$match": {“$and”: \[{“created”: {“$gt”: “2015-01-01T00:00:00.000Z”}}, {“created”: {“$lt”: “2015-06-01T00:00:00.000Z”}}]}} |
| $project | A JSON object that specifies existing properties or new properties to pass along to the next stage in the pipeline.                                                                             | {"$project",{"\_id": 1, "name": 1}}                                                                                         |
| $skip    | An integer representing the number of results to skip. Useful for paging where mapping or groupings have been applied, or where search the search criteria cannot be paged using unique fields. | {"$skip": 10}                                                                                                               |
| $sort    | A JSON object containing one or more properties, the values of which must be 1 for ascending and -1 for descending.                                                                             | {"$sort": {"\_id": -1}}                                                                                                     |
| $unwind  | Accepts a property name and deconstructs an array into a single property in multiple documents.                                                                                                 | {"$unwind": roles}                                                                                                          |

## $group

Result sets can be grouped in order to apply accumulators and aggregate data from accessible documents.

The $group stage takes the following form:

```
{"$group": {"_id": <expression>, <property>: { <accumulator1> : <expression> }, <property>: { <accumulator2> : <expression> },... }}
```

For available accumulators, see Accumulator Operators.

The `_id` property is your grouping expression and is required. The `_id` can be null, it can be a single property or a combination of properties and group expressions. When you specify an `_id` of `null`, you will group together all of the documents in the result set. This is helpful for calculating accumulated values for all of the documents, for example when calculating a cont of the total number of documents in your result. When the `_id` is null, it is not included in the result set.

#### Examples

In the following example, we aggregate a custom `c_patient` object with a two-stage pipeline. The first stage is a `$match` that filters the documents down to patients with a `gender: 'f'`. Those documents are then passed into the `$group` stage where we count all of the documents in the result set and return a `count` property. Because we pass `null` for the grouping `_id`, we can calculate an accumulation for the entire result set.

{% tabs %}
{% tab title="Request" %}

```javascript
GET /c_patients?pipeline=[{"$match": {"gender": "f"}},{"$group": {"_id": null, "count": {"$count": "_id"}}}]
```

{% endtab %}

{% tab title="Response" %}

```javascript
{
    "data": [
        {
            "count": 359
        }
    ],
    "hasMore": false,
    "object": "list"
}
```

{% endtab %}
{% endtabs %}

The following example is a single-stage `$group` on the `connections` object. The grouping is by the connections' `context._id`. Then, a count is calculated for each context grouping and returned in `total` properties.

{% tabs %}
{% tab title="Request" %}

```javascript
GET /connections?pipeline=[{"$group": {"_id": "context._id", "total": {"$count": "_id"}}}, {"$sort": {"total": 1}}]
```

{% endtab %}

{% tab title="Response" %}

```javascript
{
    "data": [
        {
            "_id": "507f1f77bcf86cd799439011",
            "total": 2
        },
        {
            "_id": "55ff9db67a00a89c14aa31a6",
            "total": 4
        },
        {
            "_id": "4d656461626c6552756c656b",
            "total": 12
        },
        {
            "_id": "507f191e810c19729de860ea",
            "total": 52
        }
    ],
    "hasMore": false,
    "object": "list"
}
```

{% endtab %}
{% endtabs %}

## $limit

The limit stage limits the number of documents passed onto the next stage in the pipeline.

```
{"$limit": <integer>}
```

#### Example

{% tabs %}
{% tab title="Request" %}

```javascript
GET /c_patients?pipeline=[{"$match": {"c_gender": "f"}},{"$project": {"c_name": 1}},{"$limit":2}]
```

{% endtab %}

{% tab title="Response" %}

```javascript
{
    "data": [
        {
            "_id": "59495fc1514ef1010000598d",
            "c_name": {
                "c_first": "Jane",
                "c_last": "Smith"
            }
        },
        {
            "_id": "59496035bfd25d0100661faf",
            "c_name": {
                "c_first": "Holly",
                "c_last": "Forrester"
            }
        }
    ],
    "hasMore": false,
    "object": "list"
}
```

{% endtab %}
{% endtabs %}

## $match

The `$match` stage accepts a JSON document that specifies query conditions on indexed properties to filter the resulting document list. This document list is then passed onto the next stage of the pipeline or returned as the response if it is the last stage of the pipeline.

```
{"$match": {<query conditions>}}
```

See Query Operators for more details and examples of available conditions.

#### Examples

In this example, we have just a one stage pipeline where we match accounts by the first name in a custom script. This returns a result set of accounts filtered to only include accounts where the last name starts with 's'.

{% tabs %}
{% tab title="" %}

```javascript
GET /accounts?pipeline=[{"$match": {"name.last": {"$regex": "/^s/i"}}}]
```

{% endtab %}

{% tab title="Response" %}

```javascript
{
    "data": [
        {
            "_id": "5888b91d35f8c91536c6ddea",
            "access": 6,
            "c_enrollments": [],
            "c_study_groups": [],
            "created": "2017-01-25T14:41:33.297Z",
            "email": "john@medable.com",
            "favorite": false,
            "gender": "m",
            "inherited_roles": [
                "000000000000000000000007",
                "000000000000000000000006"
            ],
            "key": {
                "fingerprint": "8e53d430-e30c-11e6-aaeb-99172490c13b",
                "secret": "heBcf2Wgh6hjezADTAc88wHJNGcJgwUd"
            },
            "locale": "en_US",
            "locked": false,
            "mobile": "+12223334449",
            "name": {
                "first": "John",
                "last": "Silver"
            },
            "object": "account",
            "roles": [
                "000000000000000000000004"
            ],
            "shared": false,
            "state": "verified",
            "updated": "2017-01-25T14:42:54.970Z",
            "updater": {
                "_id": "5888b91d35f8c91536c6ddea",
                "object": "account",
                "path": "/accounts/5888b91d35f8c91536c6ddea"
            }
        }
    ],
    "hasMore": false,
    "object": "list"
}
```

{% endtab %}
{% endtabs %}

## $project

The projection stage takes a JSON document that can specify the inclusion of properties and/or the addition of new properties.

```
{"$project": {<projection spec>}}
```

### Specifications

| Specification               | Description                             |
| --------------------------- | --------------------------------------- |
| `property`: `1` *or* `true` | Include this property in the projection |
| `property`: `expression`    | Add a new property to the projection    |

### Considerations

* Except for `_id`, only properties that you specify for inclusion or add to the projection as new properties will be projected on to the next stage.
* The `_id` property is always included in the projection, even if you don't specify inclusion.
* Existing properties cannot be overridden in a projection. If you wish to transform an existing property in a projection, it is recommended that you instead add a new property with a different name.

#### Examples

In the below example, a custom `c_patient` object is aggregated with a two stage pipeline in a script. The first stage is a `$match` that finds all patient records with `c_gender: 'm'`. That result set is then passed into the `$project` stage. The projection then specifies that `c_name`, and `c_gender` properties are included and that `fullName` and `age` are added as new properties.

* The new `fullName` property is a concatenation of the `c_first` and `c_last` sub-properties if the `c_name` document property using the `$concat` string operator and `$string` literal operator.
* The new `age` property is calculated from an existing `c_birthdate` property, using `$subtract`, `$divide`, and `$floor` arithmetic operators to calculate the age from the difference between `c_birthdate` and today's date.
* Since the `$project` stage is the last stage in the pipeline, the result of the projection is returned as the response.

{% tabs %}
{% tab title="Project Example" %}

```
return org.objects.c_patients.aggregate()
  .match({
    c_gender: 'm'
    })
    .project({
    c_name: 1,
    fullName: {$concat: ['c_name.c_first', {$string: ' '}, 'c_name.c_last']},
    c_gender: 1,
    age: { $floor: { $divide: [{ $subtract:[new Date(), 'c_birthdate']}, (365 * 24*60*60*1000)]}}
    })
    .toList();
```

{% endtab %}

{% tab title="Project Result" %}

```javascript
{
    "data": [
        {
            "_id": "59495fb2514ef1010000598b",
            "age": 38,
            "c_gender": "m",
            "c_name": {
                "c_first": "John",
                "c_last": "Smith"
            },
            "fullName": "John Smith"
        },
        {
            "_id": "59495fd7514ef1010000598f",
            "age": 48,
            "c_gender": "m",
            "c_name": {
                "c_first": "Charles",
                "c_last": "Jones"
            },
            "fullName": "Charles Jones"
        },
        {
            "_id": "59496012cfc40501006c10d2",
            "age": 43,
            "c_gender": "m",
            "c_name": {
                "c_first": "Richard",
                "c_last": "Roberts"
            },
            "fullName": "Richard Roberts"
        }
    ],
    "hasMore": false,
    "object": "list"
}
```

{% endtab %}
{% endtabs %}

## $skip

The skip stage skips over the specified number of documents and passes the remaining documents into the next stage of the pipeline.

```
{"$skip": <positive integer>}
```

#### Example

{% tabs %}
{% tab title="Example Request" %}

```javascript
GET /c_patients?pipeline=[{"$match": {"c_gender": "f"}},{"$project": {"c_name": 1}},{"$skip": 1},{"$limit":1}]
```

{% endtab %}

{% tab title="Example Response" %}

```javascript
{
    "data": [
        {
            "_id": "59496035bfd25d0100661faf",
            "c_name": {
                "c_first": "Holly",
                "c_last": "Forrester"
            }
        }
    ],
    "hasMore": false,
    "object": "list"
}
```

{% endtab %}
{% endtabs %}

## $sort

The sort stage take all documents it receives and passes them into the next stage in the specified sort order.

```
{"$sort": {<property1>: <sort order>, <property2>: <sort order>, ...}}
```

Where sort order can be `1` for ascending or `-1` for descending.

### Performance Considerations

To optimize performance, it is best to use `$sort` at the beginning of the pipeline or near the beginning, after a `$match` and `$limit`. And `$sort` should always proceed `$project`, `$unwind`, or `$group` if it is required -- otherwise your query may be slow or time out.

#### Example

{% tabs %}
{% tab title="Example Request" %}

```javascript
GET /c_patients?pipeline=[{"$match": {"c_gender": "f"}},{"$limit": 2},{"$sort": {"c_name.c_last": 1},{"$project": {"c_name": 1}}}]
```

{% endtab %}

{% tab title="Example Response" %}

```javascript
{
    "data": [
        {
            "_id": "59496035bfd25d0100661faf",
            "c_name": {
                "c_first": "Holly",
                "c_last": "Forrester"
            }
        },
        {
            "_id": "59495fc1514ef1010000598d",
            "c_name": {
                "c_first": "Jane",
                "c_last": "Smith"
            }
        }
    ],
    "hasMore": false,
    "object": "list"
}
```

{% endtab %}
{% endtabs %}

## $unwind

Unwinding arrays and document arrays expands them into individual documents so that aggregation can take place of the resulting output. For example, mapping the roles array of an account with 3 roles results in 3 output documents, each containing a roles ObjectId property.

```
{"$unwind": <property>}
```

#### Example

```javascript
GET /accounts/?pipeline=[{"$project": {"roles": 1}}, {"$unwind": "roles"}]

{
    "data": [
        {
            "_id": "55ff9db67a00a89c14aa31a6",
            "object": "account",
            "roles": "000000000000000000000004"
        },
        {
            "_id": "55ff9db67a00a89c14aa31a6",
            "object": "account",
            "roles": "000000000000000000000005"
        },
        {
            "_id": "55ff9db67a00a89c14aa31a6",
            "object": "account",
            "roles": "000000000000000000000007"
        }
    ],
    "hasMore": false,
    "object": "list"
}
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.medable.com/cortex-api/aggregating.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
