On this page本页内容
mapReduce
¶The mapReduce
command allows you to run map-reduce aggregation operations over a collection.mapReduce
命令允许您在集合上运行map-reduce聚合操作。
Aggregation Pipeline as Alternative聚合管道作为替代方案
Aggregation pipeline provides better performance and a more coherent interface than map-reduce, and map-reduce expressions can be rewritten using aggregation pipeline operators, such as 聚合管道提供了比map-reduce更好的性能和更一致的接口,并且可以使用聚合管道运算符(如$group
, $merge
, etc.$group
、$merge
等)重写map-reduce表达式。
For map-reduce expressions that require custom functionality, MongoDB provides the 对于需要自定义功能的map-reduce表达式,MongoDB从4.4版开始提供$accumulator
and $function
aggregation operators starting in version 4.4. $accumulator
和$function
聚合运算符。These operators provide users with the ability to define custom aggregation expressions in JavaScript.这些操作符使用户能够在JavaScript中定义自定义聚合表达式。
For examples of aggregation alternatives to map-reduce operations, see Map-Reduce Examples. 有关映射减少操作的聚合替代方案的示例,请参阅Map-Reduce示例。See also Map-Reduce to Aggregation Pipeline.另请参见对聚合管道的Map-Deduce。
Note
Starting in version 4.4, MongoDB ignores the verbose option.从版本4.4开始,MongoDB将忽略verbose选项。
Starting in version 4.2, MongoDB deprecates:从4.2版开始,MongoDB不推荐:
The mapReduce
command has the following syntax:语法如下所示:
The command takes the following fields as arguments:该命令将以下字段作为参数:
mapReduce | string |
Note Views |
map | JavaScript or String |
|
reduce | JavaScript or String |
|
out | string or document |
|
query | document |
|
sort | document |
|
limit | number |
|
finalize | JavaScript or String |
|
scope | document |
|
jsMode | boolean |
|
verbose | boolean |
|
bypassDocumentValidation | boolean |
Note
|
collation | document | Optional.
Collation
|
writeConcern | document |
|
comment |
any |
|
The following is a prototype usage of the 以下是mapReduce
command:mapReduce
命令的原型用法:
JavaScript in MongoDB
Although 尽管mapReduce
uses JavaScript, most interactions with MongoDB do not use JavaScript but use an idiomatic driver in the language of the interacting application.mapReduce
使用JavaScript,但与MongoDB的大多数交互都不使用JavaScript,而是使用交互应用程序语言中的惯用驱动程序。
map
Functionmap
函数的要求¶The map
function is responsible for transforming each input document into zero or more documents. It can access the variables defined in the scope
parameter, and has the following prototype:
The map
function has the following requirements:
map
function, reference the current document as this
within the function.map
function should not access the database for any reason.map
function should be pure, or have no impact outside of the function (i.e. side effects.)map
function may optionally call emit(key,value)
any number of times to create an output document associating key
with value
.mapReduce
no longer supports the deprecated BSON type JavaScript code with scope (BSON type 15) for its functions. The map
function must be either BSON type String (BSON type 2) or BSON type JavaScript (BSON type 13). To pass constant values which will be accessible in the map
function, use the scope
parameter.
map
function has been deprecated since version 4.2.1.The following map
function will call emit(key,value)
either 0 or 1 times depending on the value of the input document’s status
field:
The following map
function may call emit(key,value)
multiple times depending on the number of elements in the input document’s items
field:
reduce
Function¶The reduce
function has the following prototype:
The reduce
function exhibits the following behaviors:
reduce
function should not access the database, even to perform read operations.reduce
function should not affect the outside system.reduce
function for a key that has only a single value. The values
argument is an array whose elements are the value
objects that are “mapped” to the key
.reduce
function more than once for the same key. In this case, the previous output from the reduce
function for that key will become one of the input values to the next reduce
function invocation for that key.reduce
function can access the variables defined in the scope
parameter.reduce
must not be larger than half of MongoDB’s maximum BSON document size. This requirement may be violated when large documents are returned and then joined together in subsequent reduce
steps.mapReduce
no longer supports the deprecated BSON type JavaScript code with scope (BSON type 15) for its functions. The reduce
function must be either BSON type String (BSON type 2) or BSON type JavaScript (BSON type 13). To pass constant values which will be accessible in the reduce
function, use the scope
parameter.
reduce
function has been deprecated since version 4.2.1.Because it is possible to invoke the reduce
function more than once for the same key, the following properties need to be true:
value
emitted by the map
function.reduce
function must be associative. The following statement must be true:
reduce
function must be idempotent. Ensure that the following statement is true:
reduce
function should be commutative: that is, the order of the elements in the valuesArray
should not affect the output of the reduce
function, so that the following statement is true:
finalize
Function¶The finalize
function has the following prototype:
The finalize
function receives as its arguments a key
value and the reducedValue
from the reduce
function. Be aware that:
finalize
function should not access the database for any reason.finalize
function should be pure, or have no impact outside of the function (i.e. side effects.)finalize
function can access the variables defined in the scope
parameter.mapReduce
no longer supports the deprecated BSON type JavaScript code with scope (BSON type 15) for its functions. The finalize
function must be either BSON type String (BSON type 2) or BSON type JavaScript (BSON type 13). To pass constant values which will be accessible in the finalize
function, use the scope
parameter.
finalize
function has been deprecated since version 4.2.1.out
Options¶You can specify the following options for the out
parameter:
This option outputs to a new collection, and is not available on secondary members of replica sets.
Note
Starting in version 4.2, MongoDB deprecates:
This option is only available when passing a collection that already exists to out
. It is not available on secondary members of replica sets.
When you output to a collection with an action, the out
has the following parameters:
<action>
: Specify one of the following actions:
replace
Replace the contents of the <collectionName>
if the collection with the <collectionName>
exists.
merge
Merge the new result with the existing result if the output collection already exists. If an existing document has the same key as the new result, overwrite that existing document.
reduce
Merge the new result with the existing result if the output collection already exists. If an existing document has the same key as the new result, apply the reduce
function to both the new and the existing documents and overwrite the existing document with the result.
db
:
Optional. 可选。The name of the database that you want the map-reduce operation to write its output. By default this will be the same database as the input collection.
sharded
:
Note
Starting in version 4.2, the use of the sharded
option is deprecated.
Optional. 可选。If true
and you have enabled sharding on output database, the map-reduce operation will shard the output collection using the _id
field as the shard key.
If true
and collectionName
is an existing unsharded collection, map-reduce fails.
nonAtomic
:
Note
Starting in MongoDB 4.2, explicitly setting nonAtomic
to false
is deprecated.
Optional. 可选。Specify output operation as non-atomic. This applies only
to the merge
and reduce
output modes, which may take minutes to execute.
By default nonAtomic
is false
, and the map-reduce operation locks the database during post-processing.
If nonAtomic
is true
, the post-processing step prevents MongoDB from locking the database: during this time, other clients will be able to read intermediate states of the output collection.
Perform the map-reduce operation in memory and return the result. This option is the only available option for out
on secondary members of replica sets.
The result must fit within the maximum size of a BSON document.
If your MongoDB deployment enforces authentication, the user executing the mapReduce
command must possess the following privilege actions:
Map-reduce with {out : inline}
output option:
Map-reduce with the replace
action when outputting to a collection:
Map-reduce with the merge
or reduce
actions when outputting to a collection:
The readWrite
built-in role provides the necessary permissions to perform map-reduce aggregation.
MongoDB drivers automatically set afterClusterTime for operations associated with causally consistent sessions. Starting in MongoDB 4.2, the mapReduce
command no longer support afterClusterTime. As such, mapReduce
cannot be associatd with causally consistent sessions.
In the mongo
shell, the db.collection.mapReduce()
method is a wrapper around the mapReduce
command. The following examples use the db.collection.mapReduce()
method:
Aggregation Pipeline as Alternative
Aggregation pipeline provides better performance and a simpler interface than map-reduce, and map-reduce expressions can be rewritten using aggregation pipeline operators such as $group
, $merge
, and others.
For map-reduce expressions that require custom functionality, MongoDB provides the $accumulator
and $function
aggregation operators starting in version 4.4. These operators provide the ability to define custom aggregation expressions in JavaScript.
The examples in this section include aggregation pipeline alternatives without custom aggregation expressions. For alternatives that use custom expressions, see Map-Reduce to Aggregation Pipeline Translation Examples.
Create a sample collection orders
with these documents:
Perform the map-reduce operation on the orders
collection to group by the cust_id
, and calculate the sum of the price
for each cust_id
:
this
refers to the document that the map-reduce operation is processing.price
to the cust_id
for each document and emits the cust_id
and price
.keyCustId
and valuesPrices
:
valuesPrices
is an array whose elements are the price
values emitted by the map function and grouped by keyCustId
.valuesPrice
array to the sum of its elements.orders
collection using the mapFunction1
map function and the reduceFunction1
reduce function:
This operation outputs the results to a collection named map_reduce_example
. If the map_reduce_example
collection already exists, the operation will replace the contents with the results of this map-reduce operation.
map_reduce_example
collection to verify the results:
The operation returns these documents:
Using the available aggregation pipeline operators, you can rewrite the map-reduce operation without defining custom functions:
$group
stage groups by the cust_id
and calculates the value
field using $sum
. The value
field contains the total price
for each cust_id
.
This stage outputs these documents to the next stage:
$out
writes the output to the collection agg_alternative_1
. Alternatively, you could use $merge
instead of $out
.agg_alternative_1
collection to verify the results:
The operation returns these documents:
See also参阅
For an alternative that uses custom aggregation expressions, see Map-Reduce to Aggregation Pipeline Translation Examples.
In the following example, you will see a map-reduce operation on the orders
collection for all documents that have an ord_date
value greater than or equal to 2020-03-01
.
The operation in the example:
item.sku
field, and calculates the number of orders and the total quantity ordered for each sku
.sku
value and merges the results into the output collection.When merging results, if an existing document has the same key as the new result, the operation overwrites the existing document. If there is no existing document with the same key, the operation inserts the document.
Example steps:
this
refers to the document that the map-reduce operation is processing.sku
with a new object value
that contains the count
of 1
and the item qty
for the order and emits the sku
(stored in the key
)
and the value
.keySKU
and countObjVals
:
countObjVals
is an array whose elements are the objects mapped to the grouped keySKU
values passed by map function to the reducer function.countObjVals
array to a single object reducedValue
that contains the count
and the qty
fields.reducedVal
, the count
field contains the sum of the count
fields from the individual array elements, and the qty
field contains the sum of the qty
fields from the individual array elements.key
and reducedVal
. The function modifies the reducedVal
object to add a computed field named avg
and returns the modified object:
orders
collection using the mapFunction2
, reduceFunction2
, and finalizeFunction2
functions:
This operation uses the query
field to select only those documents with ord_date
greater than or equal to new Date("2020-03-01")
. Then it outputs the results to a collection map_reduce_example2
.
If the map_reduce_example2
collection already exists, the operation will merge the existing contents with the results of this map-reduce operation. That is, if an existing document has the same key as the new result, the operation overwrites the existing document. If there is no existing document with the same key, the operation inserts the document.
map_reduce_example2
collection to verify the results:
The operation returns these documents:
Using the available aggregation pipeline operators, you can rewrite the map-reduce operation without defining custom functions:
$match
stage selects only those documents with ord_date
greater than or equal to new Date("2020-03-01")
.$unwinds
stage breaks down the document by the items
array field to output a document for each array element. $group
stage groups by the items.sku
, calculating for each sku:
qty
field. The qty
field contains the total qty
ordered per each items.sku
using $sum
.orders_ids
array. The orders_ids
field contains an array of distinct order _id
’s for the items.sku
using $addToSet
.$project
stage reshapes the output document to mirror the map-reduce’s output to have two fields _id
and value
. The $project
sets:
$merge
writes the output to the collection agg_alternative_3
. If an existing document has the same key _id
as the new result, the operation overwrites the existing document. If there is no existing document with the same key, the operation inserts the document.agg_alternative_3
collection to verify the results:
The operation returns these documents:
See also参阅
For an alternative that uses custom aggregation expressions, see Map-Reduce to Aggregation Pipeline Translation Examples.
For more information and examples, see the Map-Reduce page and Perform Incremental Map-Reduce.
If you set the out parameter to write the results to a collection, the mapReduce
command returns a document in the following form:
If you set the out parameter to output the results inline, the mapReduce
command returns a document in the following form:
mapReduce.
result
¶For output sent to a collection, this value is either:
mapReduce.
results
¶For output written inline, an array of resulting documents. Each resulting document contains two fields:
_id
field contains the key
value,value
field contains the reduced or finalized value for the associated key
.mapReduce.
timeMillis
¶Available for MongoDB 4.2 and earlier only
The command execution time in milliseconds.
mapReduce.
counts
¶Available for MongoDB 4.2 and earlier only
Various count statistics from the mapReduce
command.
mapReduce.counts.
input
¶Available for MongoDB 4.2 and earlier only
The number of input documents, which is the number of times the mapReduce
command called the map
function.
mapReduce.counts.
emit
¶Available for MongoDB 4.2 and earlier only
The number of times the mapReduce
command called the emit
function.
mapReduce.counts.
reduce
¶Available for MongoDB 4.2 and earlier only
The number of times the mapReduce
command called the reduce
function.
mapReduce.counts.
output
¶Available for MongoDB 4.2 and earlier only
The number of output values produced.
mapReduce.
ok
¶A value of 1
indicates the mapReduce
command ran successfully. A value of 0
indicates an error.
In addition to the aforementioned command specific return fields, the db.runCommand()
includes additional information:
$clusterTime
, and operationTime
.operationTime
and $clusterTime
.See db.runCommand Response for details on these fields.