$bucketAuto (aggregation)¶

~~On this page~~本页内容

~~Definition~~定义
~~Behavior~~行为
~~Example~~示例

Definition定义¶

$bucketAuto¶

~~New in version 3.4.~~版本3.4中的新功能。

~~Categorizes incoming documents into a specific number of groups, called buckets, based on a specified expression.~~ 根据指定的表达式将传入文档分类为特定数量的组，称为bucket。~~Bucket boundaries are automatically determined in an attempt to evenly distribute the documents into the specified number of buckets.~~为了将文档均匀地分配到指定数量的存储桶中，存储桶边界将自动确定。

~~Each bucket is represented as a document in the output.~~ 每个bucket在输出中表示为一个文档。~~The document for each bucket contains:~~每个桶的文档包含：

~~An _id object that specifies the bounds of the bucket.~~指定存储桶边界的_id对象。
- ~~The _id.min field specifies the inclusive lower bound for the bucket.~~_id.min字段指定桶的包含下限。
- ~~The _id.max field specifies the upper bound for the bucket.~~ _id.max字段指定桶的上限。~~This bound is exclusive for all buckets except the final bucket in the series, where it is inclusive.~~除了系列中的最后一个bucket之外，这个界限对所有bucket都是独占的，在这个bucket中它是包含的。
~~A count field that contains the number of documents in the bucket.~~ 一个count字段，包含存储桶中的文档数。~~The count field is included by default when the output document is not specified.~~未指定output文档时，默认情况下会包含count字段。

~~The $bucketAuto stage has the following form:~~$bucketAuto阶段的形式如下：

{
  $bucketAuto: {
      groupBy: <expression>,
      buckets: <number>,
      output: {
         <output1>: { <$accumulator expression> },
         ...
      }
      granularity: <string>
  }
}

~~Field~~字段 ~~Type~~类型 ~~Description~~描述

groupBy expression ~~An expression to group documents by.~~ 文档分组依据的表达式。~~To specify a field path, prefix the field name with a dollar sign $ and enclose it in quotes.~~要指定字段路径，请在字段名称前面加上美元符号$，并用引号括起来。

buckets integer ~~A positive 32-bit integer that specifies the number of buckets into which input documents are grouped.~~一个32位正整数，指定将输入文档分组到的存储桶数。

output

document

~~Optional.~~可选。~~A document that specifies the fields to include in the output documents in addition to the _id field.~~ 除了_id字段之外，还指定要包含在输出文档中的字段的文档。~~To specify the field to include, you must use accumulator expressions:~~要指定要包含的字段，必须使用累加器表达式：

<outputfield1>: { <accumulator>: <expression1> },
...

~~The default count field is not included in the output document when output is specified.~~ 指定输出时，output文档中不包括默认count字段。~~Explicitly specify the count expression as part of the output document to include it:~~明确指定count表达式作为output文档的一部分，以将其包括在内：

output: {
  <outputfield1>: { <accumulator>: <expression1> },
  ...
  count: { $sum: 1 }
}

granularity

string

~~Optional.~~可选。~~A string that specifies the preferred number series to use to ensure that the calculated boundary edges end on preferred round numbers or their powers of 10.~~一个字符串，指定要使用的首选数列，以确保计算出的边界边以首选整数或其10的幂结束。

~~Available only if the all groupBy values are numeric and none of them are NaN.~~仅当所有groupBy值均为数字且均为NaN时才可用。

~~The suppported values of granularity are:~~支持的granularity（粒度）值为：

"R5"
"R10"
"R20"
"R40"
"R80"
"1-2-5"

"E6"
"E12"
"E24"
"E48"
"E96"
"E192"
"POWERSOF2"

Behavior行为¶

~~There may be less than the specified number of buckets if:~~如果出现以下情况，铲斗数量可能会少于规定的数量：

~~The number of input documents is less than the specified number of buckets.~~输入文档的数量小于指定的存储桶数量。
~~The number of unique values of the groupBy expression is less than the specified number of buckets.~~groupBy表达式的唯一值数小于指定的buckets数。
~~The granularity has fewer intervals than the number of buckets.~~granularity的间隔小于buckets的数量。
~~The granularity is not fine enough to evenly distribute documents into the specified number of buckets.~~granularity不够细，无法将文档均匀地分配到指定数量的buckets中。

~~If the groupBy expression refers to an array or document, the values are arranged using the same ordering as in $sort before determining the bucket boundaries.~~如果groupBy表达式引用数组或文档，则在确定存储桶边界之前，使用与$sort相同的顺序排列值。

~~The even distribution of documents across buckets depends on the cardinality, or the number of unique values, of the groupBy field.~~ 文档在bucket之间的均匀分布取决于groupBy字段的基数或唯一值的数量。~~If the cardinality is not high enough, the $bucketAuto stage may not evenly distribute the results across buckets.~~如果基数不够高，$bucketAuto阶段可能无法将结果均匀地分布到各个桶中。

Granularity粒度¶

~~The $bucketAuto accepts an optional granularity parameter which ensures that the boundaries of all buckets adhere to a specified preferred number series.~~ $bucketAuto接受一个可选的granularity参数，该参数确保所有存储桶的边界都符合指定的首选数字系列。~~Using a preferred number series provides more control on where the bucket boundaries are set among the range of values in the groupBy expression.~~ 使用首选数字系列可以更好地控制在groupBy表达式中的值范围内设置桶边界的位置。~~They may also be used to help logarithmically and evenly set bucket boundaries when the range of the groupBy expression scales exponentially.~~当groupBy表达式的范围按指数缩放时，它们还可以用于帮助以对数和均匀方式设置桶边界。

Renard Series雷纳德系列¶

The Renard number series are sets of numbers derived by taking either the 5 ^th, 10 ^th, 20 ^th, 40 ^th, or 80 ^th root of 10, then including various powers of the root that equate to values between 1.0 to 10.0 (10.3 in the case of R80).雷诺数系列是通过取10的第5、第10、第20、第40或第80个根，然后包括根的各种幂，这些幂等于1.0到10.0之间的值（R80为10.3）。

Set granularity to R5, R10, R20, R40, or R80 to restrict bucket boundaries to values in the series. ~~The values of the series are multiplied by a power of 10 when the groupBy values are outside of the 1.0 to 10.0 (10.3 for R80) range.~~当groupBy值超出1.0到10.0(R80为10.3）范围时，该系列的值乘以10的幂。

~~Example~~实例

~~The R5 series is based off of the fifth root of 10, which is 1.58, and includes various powers of this root (rounded) until 10 is reached.~~ R5系列以10的第五个根为基础，即1.58，包括这个根的各种幂（四舍五入），直到达到10为止。~~The R5 series is derived as follows:~~R5系列的推导如下：

10 ^0/5 = 1
10 ^1/5 = 1.584 ~ 1.6
10 ^2/5 = 2.511 ~ 2.5
10 ^3/5 = 3.981 ~ 4.0
10 ^4/5 = 6.309 ~ 6.3
10 ^5/5 = 10

~~The same approach is applied to the other Renard series to offer finer granularity, i.e., more intervals between 1.0 and 10.0 (10.3 for R80).~~同样的方法也适用于其他雷诺系列，以提供更细的粒度，即1.0和10.0之间的更多间隔（R80为10.3）。

E Series¶

The E number series are similar to the Renard series in that they subdivide the interval from 1.0 to 10.0 by the 6 ^th, 12 ^th, 24 ^th, 48 ^th, 96 ^th, or 192 ^nd root of ten with a particular relative error.

Set granularity to E6, E12, E24, E48, E96, or E192 to restrict bucket boundaries to values in the series. The values of the series are multiplied by a power of 10 when the groupBy values are outside of the 1.0 to 10.0 range. To learn more about the E-series and their respective relative errors, see preferred number series.

1-2-5 Series¶

The 1-2-5 series behaves like a three-value Renard series, if such a series existed.

Set granularity to 1-2-5 to restrict bucket boundaries to various powers of the third root of 10, rounded to one significant digit.

Example

~~The following values are part of the 1-2-5 series:~~以下数值是1-2-5系列的一部分： 0.1, 0.2, 0.5, 1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, and so on…

Powers of Two Series两级数的幂¶

~~Set granularity to POWERSOF2 to restrict bucket boundaries to numbers that are a power of two.~~将granularity设置为POWERSOF2，将桶边界限制为2的幂。

Example

~~The following numbers adhere to the power of two Series:~~以下数字表示两个系列的幂：

2 ⁰ = 1
2 ¹ = 2
2 ² = 4
2 ³ = 8
2 ⁴ = 16
2 ⁵ = 32
and so on…

~~A common implementation is how various computer components, like memory, often adhere to the POWERSOF2 set of preferred numbers:~~一种常见的实现方式是，各种计算机组件（如内存）通常遵循首选数字的POWERSOF2集：

1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, and so on….

Comparing Different Granularities比较不同粒度¶

~~The following operation demonstrates how specifying different values for granularity affects how $bucketAuto determines bucket boundaries.~~ 下面的操作演示了为granularity指定不同的值如何影响$bucketAuto确定桶边界的方式。~~A collection of things have an _id numbered from 1 to 100:~~一组things的_id编号从1到100：

{ _id: 1 }
{ _id: 2 }
...
{ _id: 100 }

~~Different values for granularity are substituted into the following operation:~~granularity的不同值被替换为以下操作：

db.things.aggregate( [
  {
    $bucketAuto: {
      groupBy: "$_id",
      buckets: 5,
      granularity: <granularity>
    }
  }
] )

~~The results in the following table demonstrate how different values for granularity yield different bucket boundaries:~~下表中的结果显示了不同的granularity值如何产生不同的桶边界：

Granularity	~~Results~~结果	~~Notes~~备注
No granularity	{ “_id” : { “min” : 0, “max” : 20 }, “count” : 20 } { “_id” : { “min” : 20, “max” : 40 }, “count” : 20 } { “_id” : { “min” : 40, “max” : 60 }, “count” : 20 } { “_id” : { “min” : 60, “max” : 80 }, “count” : 20 } { “_id” : { “min” : 80, “max” : 99 }, “count” : 20 }
R20	{ “_id” : { “min” : 0, “max” : 20 }, “count” : 20 } { “_id” : { “min” : 20, “max” : 40 }, “count” : 20 } { “_id” : { “min” : 40, “max” : 63 }, “count” : 23 } { “_id” : { “min” : 63, “max” : 90 }, “count” : 27 } { “_id” : { “min” : 90, “max” : 100 }, “count” : 10 }
E24	{ “_id” : { “min” : 0, “max” : 20 }, “count” : 20 } { “_id” : { “min” : 20, “max” : 43 }, “count” : 23 } { “_id” : { “min” : 43, “max” : 68 }, “count” : 25 } { “_id” : { “min” : 68, “max” : 91 }, “count” : 23 } { “_id” : { “min” : 91, “max” : 100 }, “count” : 9 }
1-2-5	{ “_id” : { “min” : 0, “max” : 20 }, “count” : 20 } { “_id” : { “min” : 20, “max” : 50 }, “count” : 30 } { “_id” : { “min” : 50, “max” : 100 }, “count” : 50 }	~~The specified number of buckets exceeds the number of intervals in the series.~~指定的桶数超过了系列中的间隔数。
POWERSOF2	{ “_id” : { “min” : 0, “max” : 32 }, “count” : 32 } { “_id” : { “min” : 32, “max” : 64 }, “count” : 32 } { “_id” : { “min” : 64, “max” : 128 }, “count” : 36 }	~~The specified number of buckets exceeds the number of intervals in the series.~~指定的桶数超过了系列中的间隔数。

Example示例¶

~~Consider a collection artwork with the following documents:~~考虑一个具有以下文档的集合artwork：

{ "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926,
    "price" : NumberDecimal("199.99"),
    "dimensions" : { "height" : 39, "width" : 21, "units" : "in" } }
{ "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902,
    "price" : NumberDecimal("280.00"),
    "dimensions" : { "height" : 49, "width" : 32, "units" : "in" } }
{ "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925,
    "price" : NumberDecimal("76.04"),
    "dimensions" : { "height" : 25, "width" : 20, "units" : "in" } }
{ "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai",
    "price" : NumberDecimal("167.30"),
    "dimensions" : { "height" : 24, "width" : 36, "units" : "in" } }
{ "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931,
    "price" : NumberDecimal("483.00"),
    "dimensions" : { "height" : 20, "width" : 24, "units" : "in" } }
{ "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913,
    "price" : NumberDecimal("385.00"),
    "dimensions" : { "height" : 30, "width" : 46, "units" : "in" } }
{ "_id" : 7, "title" : "The Scream", "artist" : "Munch",
    "price" : NumberDecimal("159.00"),
    "dimensions" : { "height" : 24, "width" : 18, "units" : "in" } }
{ "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918,
    "price" : NumberDecimal("118.42"),
    "dimensions" : { "height" : 24, "width" : 20, "units" : "in" } }

Multi-Faceted Aggregation多面聚合¶

~~The $bucketAuto stage can be used within the $facet stage to process multiple aggregation pipelines on the same set of input documents from artwork.~~$bucketAuto阶段可以在$facet阶段中使用，以处理来自artwork的同一组输入文档上的多个聚合管道。

~~The following aggregation pipeline groups the documents from the artwork collection into buckets based on price, year, and the calculated area:~~以下聚合管道根据price、year和经计算的area将artwork集合中的文档分组到桶中：

db.artwork.aggregate( [
  {
    $facet: {
      "price": [
        {
          $bucketAuto: {
            groupBy: "$price",
            buckets: 4
          }
        }
      ],
      "year": [
        {
          $bucketAuto: {
            groupBy: "$year",
            buckets: 3,
            output: {
              "count": { $sum: 1 },
              "years": { $push: "$year" }
            }
          }
        }
      ],
      "area": [
        {
          $bucketAuto: {
            groupBy: {
              $multiply: [ "$dimensions.height", "$dimensions.width" ]
            },
            buckets: 4,
            output: {
              "count": { $sum: 1 },
              "titles": { $push: "$title" }
            }
          }
        }
      ]
    }
  }
] )

~~The operation returns the following document:~~该操作将返回以下文档：

{
  "area" : [
    {
      "_id" : { "min" : 432, "max" : 500 },
      "count" : 3,
      "titles" : [
        "The Scream",
        "The Persistence of Memory",
        "Blue Flower"
      ]
    },
    {
      "_id" : { "min" : 500, "max" : 864 },
      "count" : 2,
      "titles" : [
        "Dancer",
        "The Pillars of Society"
      ]
    },
    {
      "_id" : { "min" : 864, "max" : 1568 },
      "count" : 2,
      "titles" : [
        "The Great Wave off Kanagawa",
        "Composition VII"
      ]
    },
    {
      "_id" : { "min" : 1568, "max" : 1568 },
      "count" : 1,
      "titles" : [
        "Melancholy III"
      ]
    }
  ],
  "price" : [
    {
      "_id" : { "min" : NumberDecimal("76.04"), "max" : NumberDecimal("159.00") },
      "count" : 2
    },
    {
      "_id" : { "min" : NumberDecimal("159.00"), "max" : NumberDecimal("199.99") },
      "count" : 2
    },
    {
      "_id" : { "min" : NumberDecimal("199.99"), "max" : NumberDecimal("385.00") },
      "count" : 2 },
    {
      "_id" : { "min" : NumberDecimal("385.00"), "max" : NumberDecimal("483.00") },
      "count" : 2
    }
  ],
  "year" : [
    { "_id" : { "min" : null, "max" : 1913 }, "count" : 3, "years" : [ 1902 ] },
    { "_id" : { "min" : 1913, "max" : 1926 }, "count" : 3, "years" : [ 1913, 1918, 1925 ] },
    { "_id" : { "min" : 1926, "max" : 1931 }, "count" : 2, "years" : [ 1926, 1931 ] }
  ]
}

$bucketAuto (aggregation)¶

Definition定义¶

Behavior行为¶

Granularity粒度¶

Renard Series雷纳德系列¶

E Series¶

1-2-5 Series¶

Powers of Two Series两级数的幂¶

Comparing Different Granularities比较不同粒度¶

Example示例¶

Single Facet Aggregation单面聚合¶

Multi-Faceted Aggregation多面聚合¶