On this page本页内容
The aggregation pipeline is a framework for data aggregation modeled on the concept of data processing pipelines.聚合管道是一个基于数据处理管道概念的数据聚合框架。Documents enter a multi-stage pipeline that transforms the documents into aggregated results.文档进入一个多阶段管道,将文档转换为聚合结果。For example:例如:
In the example,在这个例子中,
First Stage: The 第一阶段:$match
stage filters the documents by the status
field and passes to the next stage those documents that have status
equal to "A"
.$match
阶段按status
字段过滤文档,并将status
等于"A"
的文档传递到下一阶段。
Second Stage: The 第二阶段:$group
stage groups the documents by the cust_id
field to calculate the sum of the amount for each unique cust_id
.$group
阶段按cust_id
字段对文档进行分组,以计算每个唯一cust_id
的金额总和。
The MongoDB aggregation pipeline consists of stages.MongoDB聚合管道由多个阶段组成。Each stage transforms the documents as they pass through the pipeline.每个阶段在文档通过管道时对其进行转换。Pipeline stages do not need to produce one output document for every input document; e.g., some stages may generate new documents or filter out documents.管道阶段不需要为每个输入文档生成一个输出文档;例如,某些阶段可能会生成新文档或过滤掉文档。
Pipeline stages can appear multiple times in the pipeline with the exception of 管道阶段可以在管道中多次出现,$out
, $merge
, and $geoNear
stages.$out
、$merge
和$geoNear
阶段除外。For a list of all available stages, see Aggregation Pipeline Stages.有关所有可用阶段的列表,请参见聚合管道阶段。
MongoDB provides the MongoDB提供了db.collection.aggregate()
method in the mongo
shell and the aggregate
command to run the aggregation pipeline.mongo
shell中的db.collection.aggregate()
方法,以及aggregate
命令来运行聚合管道。
For example usage of the aggregation pipeline, consider Aggregation with User Preference Data and Aggregation with the Zip Code Data Set.例如,使用聚合管道时,请考虑使用用户首选项数据进行聚合,以及使用邮政编码数据集进行聚合。
Starting in MongoDB 4.2, you can use the aggregation pipeline for updates in:从MongoDB 4.2开始,您可以在以下位置使用聚合管道进行更新:
mongo | |
---|---|
findAndModify |
|
update |
See also另请参阅
Some pipeline stages take a pipeline expression as the operand.某些管道阶段采用管道表达式作为操作数。Pipeline expressions specify the transformation to apply to the input documents.管道表达式指定要应用于输入文档的转换。Expressions have a document structure and can contain other expression.表达式具有文档结构,可以包含其他表达式。
Pipeline expressions can only operate on the current document in the pipeline and cannot refer to data from other documents: expression operations provide in-memory transformation of documents.管道表达式只能对管道中的当前文档进行操作,不能引用其他文档中的数据:表达式操作提供文档的内存转换。
Generally, expressions are stateless and are only evaluated when seen by the aggregation process with one exception: accumulator expressions.通常,表达式是无状态的,只有在聚合过程看到时才进行计算,只有一个例外:累加器表达式。
The accumulators, used in the 在$group
stage, maintain their state (e.g. totals, maximums, minimums, and related data) as documents progress through the pipeline.$group
阶段使用的累加器在文档通过管道时保持其状态(例如,总计、最大值、最小值和相关数据)。Some accumulators are available in the 有些累加器在$project
stage; however, when used in the $project
stage, the accumulators do not maintain their state across documents.$project
阶段可用;但是,当在$project
阶段使用时,累加器不会在文档中保持其状态。
Starting in version 4.4, MongoDB provides the 从4.4版开始,MongoDB提供了$accumulator
and $function
aggregation operators.$accumulator
和$function
聚合运算符。These operators provide users with the ability to define custom aggregation expressions in JavaScript.这些运算符为用户提供了在JavaScript中定义自定义聚合表达式的能力。
For more information on expressions, see Expressions.有关表达式的详细信息,请参见表达式。
In MongoDB, the 在MongoDB中,聚合命令对单个集合进行操作,在逻辑上将整个集合传递到聚合管道中。aggregate
command operates on a single collection, logically passing the entire collection into the aggregation pipeline.To optimize the operation, wherever possible, use the following strategies to avoid scanning the entire collection.要优化操作,请尽可能使用以下策略以避免扫描整个集合。
MongoDB’s query planner analyzes an aggregation pipeline to determine whether indexes can be used to improve pipeline performance.MongoDB的查询计划器分析聚合管道,以确定是否可以使用索引来提高管道性能。For example, the following pipeline stages can take advantage of indexes:例如,以下管道阶段可以利用索引:
Note
The following pipeline stages do not represent a complete list of all stages which can use an index.以下管道阶段并不代表可以使用索引的所有阶段的完整列表。
$match
$match
stage can use an index to filter documents if it occurs at the beginning of a pipeline.$match
阶段可以使用索引来过滤文档。$sort
$sort
stage can use an index as long as it is not preceded by a $project
, $unwind
, or $group
stage.$sort
阶段可以使用索引,只要它前面没有$project
、$unwind
阶段或$group
阶段。$group
The 如果满足以下所有条件,$group
stage can sometimes use an index to find the first document in each group if all of the following criteria are met:$group
阶段有时可以使用索引查找每个组中的第一个文档:
$group
stage is preceded by a $sort
stage that sorts the field to group by,$group
阶段前面是一个$sort
阶段,它对要分组的字段进行排序,$group
stage is $first
.$group
阶段中使用的唯一累加器是$first
。See Optimization to Return the First Document of Each Group for an example.有关示例,请参见返回每个组的第一个文档的优化。
$geoNear
$geoNear
pipeline operator takes advantage of a geospatial index.$geoNear
管道操作符利用地理空间索引。$geoNear
, the $geoNear
pipeline operation must appear as the first stage in an aggregation pipeline.$geoNear
时,$geoNear
管道操作必须显示为聚合管道中的第一个阶段。Changed in version 3.2:在版本3.2中更改:Starting in MongoDB 3.2, indexes can cover an aggregation pipeline.从MongoDB 3.2开始,索引可以覆盖聚合管道。In MongoDB 2.6 and 3.0, indexes could not cover an aggregation pipeline since even when the pipeline uses an index, aggregation still requires access to the actual documents.在MongoDB 2.6和3.0中,索引不能覆盖聚合管道,因为即使管道使用索引,聚合仍然需要访问实际文档。
If your aggregation operation requires only a subset of the data in a collection, use the 如果聚合操作只需要集合中的数据子集,请使用$match
, $limit
, and $skip
stages to restrict the documents that enter at the beginning of the pipeline.$match
、$limit
和$skip
阶段来限制在管道开头输入的文档。When placed at the beginning of a pipeline, 当放置在管道的开头时,$match
operations use suitable indexes to scan only the matching documents in a collection.$match
操作使用合适的索引来只扫描集合中匹配的文档。
Placing a 将$match
pipeline stage followed by a $sort
stage at the start of the pipeline is logically equivalent to a single query with a sort and can use an index.$match
管道阶段后跟$sort
阶段放在管道的开头在逻辑上相当于一个带有排序的查询,可以使用索引。When possible, place 如果可能,请在管道的开头放置$match
operators at the beginning of the pipeline.$match
操作符。
The aggregation pipeline supports operations on sharded collections.聚合管道支持对分片集合的操作。See Aggregation Pipeline and Sharded Collections.请参阅聚合管道和分片集合。
The aggregation pipeline provides better performance and a more coherent interface than map-reduce.聚合管道提供了比map-reduce更好的性能和更一致的接口。
Various map-reduce operations can be rewritten using aggregation pipeline operators, such as 可以使用聚合管道运算符重写各种map-reduce操作,例如$group
, $merge
, etc.$group
、$merge
等。For map-reduce operations that require custom functionality, MongoDB provides the 对于需要自定义功能的map-reduce操作,MongoDB从4.4版开始提供$accumulator
and $function
aggregation operators starting in version 4.4.$accumulator
和$function
聚合运算符。These operators provide users with the ability to define custom aggregation expressions in JavaScript.这些运算符为用户提供了在JavaScript中定义自定义聚合表达式的能力。
See Map-Reduce Examples for details.有关详细信息,请参见Map-Reduce示例。
Aggregation pipeline have some limitations on value types and result size.聚合管道在值类型和结果大小上有一些限制。See Aggregation Pipeline Limits for details on limits and restrictions on the aggregation pipeline.有关聚合管道的限制和限制的详细信息,请参见聚合管道限制。
The aggregation pipeline has an internal optimization phase that provides improved performance for certain sequences of operators.聚合管道有一个内部优化阶段,为某些运算符序列提供改进的性能。For details, see Aggregation Pipeline Optimization.有关详细信息,请参见聚合管道优化。