Aggregation Pipeline Optimization聚合管道优化

On this page本页内容

Aggregation pipeline operations have an optimization phase which attempts to reshape the pipeline for improved performance.聚合管道操作有一个优化阶段,该阶段尝试重塑管道以提高性能。

To see how the optimizer transforms a particular aggregation pipeline, include the explain option in the db.collection.aggregate() method.要了解优化器如何转换特定的聚合管道,请在db.collection.aggregate()方法中包含explain选项。

Optimizations are subject to change between releases.优化可能会在不同版本之间发生变化。

Projection Optimization投影优化

The aggregation pipeline can determine if it requires only a subset of the fields in the documents to obtain the results.聚合管道可以确定是否只需要文档中字段的一个子集就可以获得结果。If so, the pipeline will only use those required fields, reducing the amount of data passing through the pipeline.如果是这样,管道将只使用那些必需的字段,从而减少通过管道的数据量。

Pipeline Sequence Optimization管道顺序优化

($project or $unset or $addFields or $set) + $match Sequence Optimization序列优化

For an aggregation pipeline that contains a projection stage ($project or $unset or $addFields or $set) followed by a $match stage, MongoDB moves any filters in the $match stage that do not require values computed in the projection stage to a new $match stage before the projection.对于包含投影阶段($project$unset$addFields$set)后跟$match阶段的聚合管道,MongoDB将$match阶段中不需要在投影阶段中计算值的任何过滤器移动到投影之前的新$match阶段。

If an aggregation pipeline contains multiple projection and/or $match stages, MongoDB performs this optimization for each $match stage, moving each $match filter before all projection stages that the filter does not depend on.如果聚合管道包含多个投影和/或$match阶段,MongoDB会对每个$match阶段执行此优化,将每个$match过滤器移到过滤器不依赖的所有投影阶段之前。

Consider a pipeline of the following stages:考虑以下阶段的管道:

{ $addFields: {
    maxTime: { $max: "$times" },
    minTime: { $min: "$times" }
} },
{ $project: {
    _id: 1, name: 1, times: 1, maxTime: 1, minTime: 1,
    avgTime: { $avg: ["$maxTime", "$minTime"] }
} },
{ $match: {name: "Joe Schmoe",maxTime: { $lt: 20 },minTime: { $gt: 5 },avgTime: { $gt: 7 }} }

The optimizer breaks up the $match stage into four individual filters, one for each key in the $match query document.优化器将 $match阶段分解为四个单独的过滤器,$match查询文档中的每个键对应一个过滤器。The optimizer then moves each filter before as many projection stages as possible, creating new $match stages as needed.然后优化器将每个过滤器移动到尽可能多的投影阶段之前,根据需要创建新的$match阶段。Given this example, the optimizer produces the following optimized pipeline:在本例中,优化器生成以下优化管道:

{ $match: { name: "Joe Schmoe" } },{ $addFields: {
    maxTime: { $max: "$times" },
    minTime: { $min: "$times" }
} },
{ $match: { maxTime: { $lt: 20 }, minTime: { $gt: 5 } } },{ $project: {
    _id: 1, name: 1, times: 1, maxTime: 1, minTime: 1,
    avgTime: { $avg: ["$maxTime", "$minTime"] }
} },
{ $match: { avgTime: { $gt: 7 } } }

The $match filter { avgTime: { $gt: 7 } } depends on the $project stage to compute the avgTime field. $match filter { avgTime: { $gt: 7 } }过滤器{ avgTime: { $gt: 7 } }依赖于$project阶段来计算avgTime字段。The $project stage is the last projection stage in this pipeline, so the $match filter on avgTime could not be moved.$project阶段是此管道中的最后一个投影阶段,因此无法移动avgTime上的$match筛选器。

The maxTime and minTime fields are computed in the $addFields stage but have no dependency on the $project stage. maxTimeminTime字段在$addFields阶段中计算,但不依赖于$project阶段。The optimizer created a new $match stage for the filters on these fields and placed it before the $project stage.优化器为这些字段上的过滤器创建了一个新的$match阶段,并将其放置在$project阶段之前。

The $match filter { name: "Joe Schmoe" } does not use any values computed in either the $project or $addFields stages so it was moved to a new $match stage before both of the projection stages.$match过滤器{ name: "Joe Schmoe" }不使用在$project$addFields 阶段中计算的任何值,因此它被移动到两个投影阶段之前的新$match阶段。

Note

After optimization, the filter { name: "Joe Schmoe" } is in a $match stage at the beginning of the pipeline. 优化后,筛选器{ name: "Joe Schmoe" }在管道开始处的$match阶段中。This has the added benefit of allowing the aggregation to use an index on the name field when initially querying the collection. 这样做的另一个好处是,在最初查询集合时,允许聚合在名称字段上使用索引。See Pipeline Operators and Indexes for more information.有关更多信息,请参阅管道运算符和索引

$sort + $match Sequence Optimization序列优化

When you have a sequence with $sort followed by a $match, the $match moves before the $sort to minimize the number of objects to sort. 如果带有$sort的序列后跟$match,则$match会移动到$sort之前,以最小化要排序的对象数。For example, if the pipeline consists of the following stages:例如,如果管道由以下阶段组成:

{ $sort: { age : -1 } },
{ $match: { status: 'A' } }

During the optimization phase, the optimizer transforms the sequence to the following:在优化阶段,优化器将序列转换为以下内容:

{ $match: { status: 'A' } },
{ $sort: { age : -1 } }

$redact + $match Sequence Optimization序列优化

When possible, when the pipeline has the $redact stage immediately followed by the $match stage, the aggregation can sometimes add a portion of the $match stage before the $redact stage. 如果可能,当管道的$redact阶段紧跟着$match阶段时,聚合有时可以在$match阶段之前添加$match阶段的一部分。If the added $match stage is at the start of a pipeline, the aggregation can use an index as well as query the collection to limit the number of documents that enter the pipeline. 如果添加的$match阶段位于管道的开头,则聚合可以使用索引以及查询集合来限制进入管道的文档数量。See Pipeline Operators and Indexes for more information.有关更多信息,请参阅管道运算符和索引

For example, if the pipeline consists of the following stages:例如,如果管道由以下阶段组成:

{ $redact: { $cond: { if: { $eq: [ "$level", 5 ] }, then: "$$PRUNE", else: "$$DESCEND" } } },
{ $match: { year: 2014, category: { $ne: "Z" } } }

The optimizer can add the same $match stage before the $redact stage:优化器可以在$redact阶段之前添加相同的$match阶段:

{ $match: { year: 2014 } },
{ $redact: { $cond: { if: { $eq: [ "$level", 5 ] }, then: "$$PRUNE", else: "$$DESCEND" } } },
{ $match: { year: 2014, category: { $ne: "Z" } } }

$project/$unset + $skip Sequence Optimization序列优化

New in version 3.2.版本3.2中的新功能。

When you have a sequence with $project or $unset followed by $skip, the $skip moves before $project. 如果序列中有$project$unset,后跟$skip,则$skip将移动到$project之前。For example, if the pipeline consists of the following stages:例如,如果管道由以下阶段组成:

{ $sort: { age : -1 } },
{ $project: { status: 1, name: 1 } },
{ $skip: 5 }

During the optimization phase, the optimizer transforms the sequence to the following:在优化阶段,优化器将序列转换为以下内容:

{ $sort: { age : -1 } },
{ $skip: 5 },
{ $project: { status: 1, name: 1 } }

Pipeline Coalescence Optimization管道聚结优化

When possible, the optimization phase coalesces a pipeline stage into its predecessor. 在可能的情况下,优化阶段将管道阶段合并到其前身。Generally, coalescence occurs after any sequence reordering optimization.通常,在任何序列重新排序优化之后都会发生合并。

$sort + $limit Coalescence

Changed in version 4.0.在版本4.0中更改。

When a $sort precedes a $limit, the optimizer can coalesce the $limit into the $sort if no intervening stages modify the number of documents (e.g. $unwind, $group). 如果$sort先于$limit,那么优化器可以将$limit合并到$sort中,前提是没有中间阶段修改文档的数量(例如$unwind$group)。MongoDB will not coalesce the $limit into the $sort if there are pipeline stages that change the number of documents between the $sort and $limit stages..如果存在更改$sort$limit阶段之间文档数量的管道阶段,MongoDB将不会将$limit合并到$sort中。

For example, if the pipeline consists of the following stages:例如,如果管道由以下阶段组成:

{ $sort : { age : -1 } },
{ $project : { age : 1, status : 1, name : 1 } },
{ $limit: 5 }

During the optimization phase, the optimizer coalesces the sequence to the following:在优化阶段,优化器将序列合并为以下内容:

{
    "$sort" : {
       "sortKey" : {
          "age" : -1
       },
       "limit" : NumberLong(5)
    }
},
{ "$project" : {
         "age" : 1,
         "status" : 1,
         "name" : 1
  }
}

This allows the sort operation to only maintain the top n results as it progresses, where n is the specified limit, and MongoDB only needs to store n items in memory [1]. 这允许排序操作在进行时只维护前n个结果,其中n是指定的限制,MongoDB只需要在内存中存储n个项[1]See $sort Operator and Memory for more information.有关更多信息,请参阅$sort运算符和内存

Sequence Optimization with $skip$skip的序列优化

If there is a $skip stage between the $sort and $limit stages, MongoDB will coalesce the $limit into the $sort stage and increase the $limit value by the $skip amount. 如果$sort$limit阶段之间存在$skip阶段,MongoDB将把$limit合并到$sort阶段,并将$limit值增加$skip总量。See $sort + $skip + $limit Sequence for an example.有关示例,请参阅$sort+$skip+$limit序列

[1]The optimization will still apply when allowDiskUse is true and the n items exceed the aggregation memory limit.allowDiskUsetruen项超过聚合内存限制时,优化仍将适用。

$limit + $limit Coalescence合并

When a $limit immediately follows another $limit, the two stages can coalesce into a single $limit where the limit amount is the smaller of the two initial limit amounts. 当一个$limit跟在另一个$limit后面时,这两个阶段可以合并为一个$limit,其中限额总量为两个初始限额总量中的较小者。For example, a pipeline contains the following sequence:例如,管道包含以下序列:

{ $limit: 100 },
{ $limit: 10 }

Then the second $limit stage can coalesce into the first $limit stage and result in a single $limit stage where the limit amount 10 is the minimum of the two initial limits 100 and 10.然后,第二个$limit阶段可以合并到第一个$limit阶段,并形成一个$limit阶段,其中限额总量10是两个初始限额10010中的最小值。

{ $limit: 10 }

$skip + $skip Coalescence合并

When a $skip immediately follows another $skip, the two stages can coalesce into a single $skip where the skip amount is the sum of the two initial skip amounts. 当一个$skip跟在另一个$skip后面时,这两个阶段可以合并为一个$skip,其中跳过总量是两个初始跳过总量的总和。For example, a pipeline contains the following sequence:例如,管道包含以下序列:

{ $skip: 5 },
{ $skip: 2 }

Then the second $skip stage can coalesce into the first $skip stage and result in a single $skip stage where the skip amount 7 is the sum of the two initial limits 5 and 2.然后,第二个$skip阶段可以合并到第一个$skip阶段,并产生一个$skip阶段,其中跳过总量7是两个初始跳过量52的总和。

{ $skip: 7 }

$match + $match Coalescence合并

When a $match immediately follows another $match, the two stages can coalesce into a single $match combining the conditions with an $and. 当一个$match跟在另一个$match后面时,这两个阶段可以合并为一个$match,将条件用$and组合在一起。For example, a pipeline contains the following sequence:例如,管道包含以下序列:

{ $match: { year: 2014 } },
{ $match: { status: "A" } }

Then the second $match stage can coalesce into the first $match stage and result in a single $match stage然后,第二个$match阶段可以合并到第一个$match阶段,并生成一个$match阶段

{ $match: { $and: [ { "year" : 2014 }, { "status" : "A" } ] } }

$lookup + $unwind Coalescence合并

New in version 3.2.版本3.2中的新功能。

When a $unwind immediately follows another $lookup, and the $unwind operates on the as field of the $lookup, the optimizer can coalesce the $unwind into the $lookup stage. 当一个$unwind跟在另一个$lookup后面,并且$unwind$lookupas字段上运行时,优化器可以将$unwind合并到$lookup阶段。This avoids creating large intermediate documents.这样可以避免创建大型中间文档。

For example, a pipeline contains the following sequence:例如,管道包含以下序列:

{
  $lookup: {
    from: "otherCollection",
    as: "resultingArray",
    localField: "x",
    foreignField: "y"
  }
},
{ $unwind: "$resultingArray"}

The optimizer can coalesce the $unwind stage into the $lookup stage. 优化器可以将$unwind阶段合并到$lookup阶段。If you run the aggregation with explain option, the explain output shows the coalesced stage:如果运行带explain选项的聚合,explain输出将显示合并阶段:

{
  $lookup: {
    from: "otherCollection",
    as: "resultingArray",
    localField: "x",
    foreignField: "y",
    unwinding: { preserveNullAndEmptyArrays: false }
  }
}

Example示例

$sort + $skip + $limit Sequence序列

A pipeline contains a sequence of $sort followed by a $skip followed by a $limit:管道包含一个$sort序列后跟着一个$skip,后跟着一个$limit

{ $sort: { age : -1 } },
{ $skip: 10 },
{ $limit: 5 }

The optimizer performs $sort + $limit Coalescence to transforms the sequence to the following:优化器执行$sort+$limit合并,将序列转换为以下内容:

{
   "$sort" : {
      "sortKey" : {
         "age" : -1
      },
      "limit" : NumberLong(15)
   }
},
{
   "$skip" : NumberLong(10)
}

MongoDB increases the $limit amount with the reordering.MongoDB通过重新订购增加了$limit金额。

See also参阅

explain option in the db.collection.aggregate()db.collection.aggregate()中的explain选项