On this page本页内容
mapReduce¶The mapReduce command allows you to run map-reduce aggregation operations over a collection.mapReduce命令允许您在集合上运行map-reduce聚合操作。
Aggregation Pipeline as Alternative聚合管道作为替代方案
Aggregation pipeline provides better performance and a more coherent interface than map-reduce, and map-reduce expressions can be rewritten using aggregation pipeline operators, such as 聚合管道提供了比map-reduce更好的性能和更一致的接口,并且可以使用聚合管道运算符(如$group, $merge, etc.$group、$merge等)重写map-reduce表达式。
For map-reduce expressions that require custom functionality, MongoDB provides the 对于需要自定义功能的map-reduce表达式,MongoDB从4.4版开始提供$accumulator and $function aggregation operators starting in version 4.4. $accumulator和$function聚合运算符。These operators provide users with the ability to define custom aggregation expressions in JavaScript.这些操作符使用户能够在JavaScript中定义自定义聚合表达式。
For examples of aggregation alternatives to map-reduce operations, see Map-Reduce Examples. 有关映射减少操作的聚合替代方案的示例,请参阅Map-Reduce示例。See also Map-Reduce to Aggregation Pipeline.另请参见对聚合管道的Map-Deduce。
Note
Starting in version 4.4, MongoDB ignores the verbose option.从版本4.4开始,MongoDB将忽略verbose选项。
Starting in version 4.2, MongoDB deprecates:从4.2版开始,MongoDB不推荐:
The mapReduce command has the following syntax:语法如下所示:
The command takes the following fields as arguments:该命令将以下字段作为参数:
| mapReduce | string |
Note Views |
| map | JavaScript or String |
|
| reduce | JavaScript or String |
|
| out | string or document |
|
| query | document |
|
| sort | document |
|
| limit | number |
|
| finalize | JavaScript or String |
|
| scope | document |
|
| jsMode | boolean |
|
| verbose | boolean |
|
| bypassDocumentValidation | boolean |
Note
|
| collation | document | Optional.
Collation
|
| writeConcern | document |
|
comment |
any |
|
The following is a prototype usage of the 以下是mapReduce command:mapReduce命令的原型用法:
JavaScript in MongoDB
Although 尽管mapReduce uses JavaScript, most interactions with MongoDB do not use JavaScript but use an idiomatic driver in the language of the interacting application.mapReduce使用JavaScript,但与MongoDB的大多数交互都不使用JavaScript,而是使用交互应用程序语言中的惯用驱动程序。
map Functionmap函数的要求¶The map function is responsible for transforming each input document into zero or more documents. It can access the variables defined in the scope parameter, and has the following prototype:
The map function has the following requirements:
map function, reference the current document as this within the function.map function should not access the database for any reason.map function should be pure, or have no impact outside of the function (i.e. side effects.)map function may optionally call emit(key,value) any number of times to create an output document associating key with value.mapReduce no longer supports the deprecated BSON type JavaScript code with scope (BSON type 15) for its functions. The map function must be either BSON type String (BSON type 2) or BSON type JavaScript (BSON type 13). To pass constant values which will be accessible in the map function, use the scope parameter.
map function has been deprecated since version 4.2.1.The following map function will call emit(key,value) either 0 or 1 times depending on the value of the input document’s status field:
The following map function may call emit(key,value) multiple times depending on the number of elements in the input document’s items field:
reduce Function¶The reduce function has the following prototype:
The reduce function exhibits the following behaviors:
reduce function should not access the database, even to perform read operations.reduce function should not affect the outside system.reduce function for a key that has only a single value. The values argument is an array whose elements are the value objects that are “mapped” to the key.reduce function more than once for the same key. In this case, the previous output from the reduce function for that key will become one of the input values to the next reduce function invocation for that key.reduce function can access the variables defined in the scope parameter.reduce must not be larger than half of MongoDB’s maximum BSON document size. This requirement may be violated when large documents are returned and then joined together in subsequent reduce steps.mapReduce no longer supports the deprecated BSON type JavaScript code with scope (BSON type 15) for its functions. The reduce function must be either BSON type String (BSON type 2) or BSON type JavaScript (BSON type 13). To pass constant values which will be accessible in the reduce function, use the scope parameter.
reduce function has been deprecated since version 4.2.1.Because it is possible to invoke the reduce function more than once for the same key, the following properties need to be true:
value emitted by the map function.reduce function must be associative. The following statement must be true:
reduce function must be idempotent. Ensure that the following statement is true:
reduce function should be commutative: that is, the order of the elements in the valuesArray should not affect the output of the reduce function, so that the following statement is true:
finalize Function¶The finalize function has the following prototype:
The finalize function receives as its arguments a key value and the reducedValue from the reduce function. Be aware that:
finalize function should not access the database for any reason.finalize function should be pure, or have no impact outside of the function (i.e. side effects.)finalize function can access the variables defined in the scope parameter.mapReduce no longer supports the deprecated BSON type JavaScript code with scope (BSON type 15) for its functions. The finalize function must be either BSON type String (BSON type 2) or BSON type JavaScript (BSON type 13). To pass constant values which will be accessible in the finalize function, use the scope parameter.
finalize function has been deprecated since version 4.2.1.out Options¶You can specify the following options for the out parameter:
This option outputs to a new collection, and is not available on secondary members of replica sets.
Note
Starting in version 4.2, MongoDB deprecates:
This option is only available when passing a collection that already exists to out. It is not available on secondary members of replica sets.
When you output to a collection with an action, the out has the following parameters:
<action>: Specify one of the following actions:
replace
Replace the contents of the <collectionName> if the collection with the <collectionName> exists.
merge
Merge the new result with the existing result if the output collection already exists. If an existing document has the same key as the new result, overwrite that existing document.
reduce
Merge the new result with the existing result if the output collection already exists. If an existing document has the same key as the new result, apply the reduce function to both the new and the existing documents and overwrite the existing document with the result.
db:
Optional. 可选。The name of the database that you want the map-reduce operation to write its output. By default this will be the same database as the input collection.
sharded:
Note
Starting in version 4.2, the use of the sharded option is deprecated.
Optional. 可选。If true and you have enabled sharding on output database, the map-reduce operation will shard the output collection using the _id field as the shard key.
If true and collectionName is an existing unsharded collection, map-reduce fails.
nonAtomic:
Note
Starting in MongoDB 4.2, explicitly setting nonAtomic to false is deprecated.
Optional. 可选。Specify output operation as non-atomic. This applies only
to the merge and reduce output modes, which may take minutes to execute.
By default nonAtomic is false, and the map-reduce operation locks the database during post-processing.
If nonAtomic is true, the post-processing step prevents MongoDB from locking the database: during this time, other clients will be able to read intermediate states of the output collection.
Perform the map-reduce operation in memory and return the result. This option is the only available option for out on secondary members of replica sets.
The result must fit within the maximum size of a BSON document.
If your MongoDB deployment enforces authentication, the user executing the mapReduce command must possess the following privilege actions:
Map-reduce with {out : inline} output option:
Map-reduce with the replace action when outputting to a collection:
Map-reduce with the merge or reduce actions when outputting to a collection:
The readWrite built-in role provides the necessary permissions to perform map-reduce aggregation.
MongoDB drivers automatically set afterClusterTime for operations associated with causally consistent sessions. Starting in MongoDB 4.2, the mapReduce command no longer support afterClusterTime. As such, mapReduce cannot be associatd with causally consistent sessions.
In the mongo shell, the db.collection.mapReduce() method is a wrapper around the mapReduce command. The following examples use the db.collection.mapReduce() method:
Aggregation Pipeline as Alternative
Aggregation pipeline provides better performance and a simpler interface than map-reduce, and map-reduce expressions can be rewritten using aggregation pipeline operators such as $group, $merge, and others.
For map-reduce expressions that require custom functionality, MongoDB provides the $accumulator and $function aggregation operators starting in version 4.4. These operators provide the ability to define custom aggregation expressions in JavaScript.
The examples in this section include aggregation pipeline alternatives without custom aggregation expressions. For alternatives that use custom expressions, see Map-Reduce to Aggregation Pipeline Translation Examples.
Create a sample collection orders with these documents:
Perform the map-reduce operation on the orders collection to group by the cust_id, and calculate the sum of the price for each cust_id:
this refers to the document that the map-reduce operation is processing.price to the cust_id for each document and emits the cust_id and price.keyCustId and valuesPrices:
valuesPrices is an array whose elements are the price values emitted by the map function and grouped by keyCustId.valuesPrice array to the sum of its elements.orders collection using the mapFunction1 map function and the reduceFunction1 reduce function:
This operation outputs the results to a collection named map_reduce_example. If the map_reduce_example collection already exists, the operation will replace the contents with the results of this map-reduce operation.
map_reduce_example collection to verify the results:
The operation returns these documents:
Using the available aggregation pipeline operators, you can rewrite the map-reduce operation without defining custom functions:
$group stage groups by the cust_id and calculates the value field using $sum. The value field contains the total price for each cust_id.
This stage outputs these documents to the next stage:
$out writes the output to the collection agg_alternative_1. Alternatively, you could use $merge instead of $out.agg_alternative_1 collection to verify the results:
The operation returns these documents:
See also参阅
For an alternative that uses custom aggregation expressions, see Map-Reduce to Aggregation Pipeline Translation Examples.
In the following example, you will see a map-reduce operation on the orders collection for all documents that have an ord_date value greater than or equal to 2020-03-01.
The operation in the example:
item.sku field, and calculates the number of orders and the total quantity ordered for each sku.sku value and merges the results into the output collection.When merging results, if an existing document has the same key as the new result, the operation overwrites the existing document. If there is no existing document with the same key, the operation inserts the document.
Example steps:
this refers to the document that the map-reduce operation is processing.sku with a new object value that contains the count of 1 and the item qty for the order and emits the sku (stored in the key)
and the value.keySKU and countObjVals:
countObjVals is an array whose elements are the objects mapped to the grouped keySKU values passed by map function to the reducer function.countObjVals array to a single object reducedValue that contains the count and the qty fields.reducedVal, the count field contains the sum of the count fields from the individual array elements, and the qty field contains the sum of the qty fields from the individual array elements.key and reducedVal. The function modifies the reducedVal object to add a computed field named avg and returns the modified object:
orders collection using the mapFunction2, reduceFunction2, and finalizeFunction2 functions:
This operation uses the query field to select only those documents with ord_date greater than or equal to new Date("2020-03-01"). Then it outputs the results to a collection map_reduce_example2.
If the map_reduce_example2 collection already exists, the operation will merge the existing contents with the results of this map-reduce operation. That is, if an existing document has the same key as the new result, the operation overwrites the existing document. If there is no existing document with the same key, the operation inserts the document.
map_reduce_example2 collection to verify the results:
The operation returns these documents:
Using the available aggregation pipeline operators, you can rewrite the map-reduce operation without defining custom functions:
$match stage selects only those documents with ord_date greater than or equal to new Date("2020-03-01").$unwinds stage breaks down the document by the items array field to output a document for each array element. $group stage groups by the items.sku, calculating for each sku:
qty field. The qty field contains the total qty ordered per each items.sku using $sum.orders_ids array. The orders_ids field contains an array of distinct order _id’s for the items.sku using $addToSet.$project stage reshapes the output document to mirror the map-reduce’s output to have two fields _id and value. The $project sets:
$merge writes the output to the collection agg_alternative_3. If an existing document has the same key _id as the new result, the operation overwrites the existing document. If there is no existing document with the same key, the operation inserts the document.agg_alternative_3 collection to verify the results:
The operation returns these documents:
See also参阅
For an alternative that uses custom aggregation expressions, see Map-Reduce to Aggregation Pipeline Translation Examples.
For more information and examples, see the Map-Reduce page and Perform Incremental Map-Reduce.
If you set the out parameter to write the results to a collection, the mapReduce command returns a document in the following form:
If you set the out parameter to output the results inline, the mapReduce command returns a document in the following form:
mapReduce.result¶For output sent to a collection, this value is either:
mapReduce.results¶For output written inline, an array of resulting documents. Each resulting document contains two fields:
_id field contains the key value,value field contains the reduced or finalized value for the associated key.mapReduce.timeMillis¶Available for MongoDB 4.2 and earlier only
The command execution time in milliseconds.
mapReduce.counts¶Available for MongoDB 4.2 and earlier only
Various count statistics from the mapReduce command.
mapReduce.counts.input¶Available for MongoDB 4.2 and earlier only
The number of input documents, which is the number of times the mapReduce command called the map function.
mapReduce.counts.emit¶Available for MongoDB 4.2 and earlier only
The number of times the mapReduce command called the emit function.
mapReduce.counts.reduce¶Available for MongoDB 4.2 and earlier only
The number of times the mapReduce command called the reduce function.
mapReduce.counts.output¶Available for MongoDB 4.2 and earlier only
The number of output values produced.
mapReduce.ok¶A value of 1 indicates the mapReduce command ran successfully. A value of 0 indicates an error.
In addition to the aforementioned command specific return fields, the db.runCommand() includes additional information:
$clusterTime, and operationTime.operationTime and $clusterTime.See db.runCommand Response for details on these fields.