Map-Reduce and Sharded Collections映射减少和分片集合

On this page本页内容

Map-reduce supports operations on sharded collections, both as an input and as an output. Map reduce支持分片集合上的操作,既可以作为输入,也可以作为输出。This section describes the behaviors of mapReduce specific to sharded collections.本节描述特定于分片集合的mapReduce行为。

However, starting in version 4.2, MongoDB deprecates the map-reduce option to create a new sharded collection as well as the use of the sharded option for map-reduce. 但是,从版本4.2开始,MongoDB不推荐使用map-reduce选项来创建新的分片集合,也不推荐使用分片选项来创建map-reduce。To output to a sharded collection, create the sharded collection first. 要输出到分片集合,请首先创建分片集合。MongoDB 4.2 also deprecates the replacement of an existing sharded collection.MongoDB 4.2还反对替换现有的分片集合。

Sharded Collection as Input分片收集作为输入

When using sharded collection as the input for a map-reduce operation, mongos will automatically dispatch the map-reduce job to each shard in parallel. 当使用分片集合作为map-reduce操作的输入时,mongos将自动并行地将map-reduce作业分派给每个分片。There is no special option required. 不需要特殊选项。mongos will wait for jobs on all shards to finish.mongos将等待所有碎片上的作业完成。

Sharded Collection as Output分片集合作为输出

If the out field for mapReduce has the sharded value, MongoDB shards the output collection using the _id field as the shard key.如果mapReduceout字段具有shared值,则MongoDB使用_id字段作为shard键对输出集合进行切分。

Note

Starting in version 4.2, MongoDB deprecates the use of the sharded option for mapReduce/db.collection.mapReduce.从版本4.2开始,MongoDB不赞成对mapReduce/db.collection.mapReduce使用sharded选项。

To output to a sharded collection:要输出到分片集合,请执行以下操作:

Note

  • During later map-reduce jobs, MongoDB splits chunks as needed.在以后的MapReduce作业中,MongoDB会根据需要拆分块。
  • Balancing of chunks for the output collection is automatically prevented during post-processing to avoid concurrency issues.在后处理过程中,会自动阻止输出集合的块平衡,以避免并发问题。