On this page本页内容
$bucketAuto¶New in version 3.4.版本3.4中的新功能。
Categorizes incoming documents into a specific number of groups, called buckets, based on a specified expression. 根据指定的表达式将传入文档分类为特定数量的组,称为bucket。Bucket boundaries are automatically determined in an attempt to evenly distribute the documents into the specified number of buckets.为了将文档均匀地分配到指定数量的存储桶中,存储桶边界将自动确定。
Each bucket is represented as a document in the output. 每个bucket在输出中表示为一个文档。The document for each bucket contains:每个桶的文档包含:
_id object that specifies the bounds of the bucket._id对象。_id.min field specifies the inclusive lower bound for the bucket._id.min字段指定桶的包含下限。_id.max field specifies the upper bound for the bucket. _id.max字段指定桶的上限。count field that contains the number of documents in the bucket. count字段,包含存储桶中的文档数。count field is included by default when the output document is not specified.output文档时,默认情况下会包含count字段。The $bucketAuto stage has the following form:$bucketAuto阶段的形式如下:
groupBy |
expression | $ and enclose it in quotes.$,并用引号括起来。 | ||
buckets |
integer | |||
output |
document |
| ||
granularity |
string |
|
There may be less than the specified number of buckets if:如果出现以下情况,铲斗数量可能会少于规定的数量:
groupBy expression is less than the specified number of buckets.groupBy表达式的唯一值数小于指定的buckets数。granularity has fewer intervals than the number of buckets.granularity的间隔小于buckets的数量。granularity is not fine enough to evenly distribute documents into the specified number of buckets.granularity不够细,无法将文档均匀地分配到指定数量的buckets中。If the 如果groupBy expression refers to an array or document, the values are arranged using the same ordering as in $sort before determining the bucket boundaries.groupBy表达式引用数组或文档,则在确定存储桶边界之前,使用与$sort相同的顺序排列值。
The even distribution of documents across buckets depends on the cardinality, or the number of unique values, of the 文档在bucket之间的均匀分布取决于groupBy field. groupBy字段的基数或唯一值的数量。If the cardinality is not high enough, the $bucketAuto stage may not evenly distribute the results across buckets.如果基数不够高,$bucketAuto阶段可能无法将结果均匀地分布到各个桶中。
The $bucketAuto accepts an optional granularity parameter which ensures that the boundaries of all buckets adhere to a specified preferred number series. $bucketAuto接受一个可选的granularity参数,该参数确保所有存储桶的边界都符合指定的首选数字系列。Using a preferred number series provides more control on where the bucket boundaries are set among the range of values in the 使用首选数字系列可以更好地控制在groupBy expression. groupBy表达式中的值范围内设置桶边界的位置。They may also be used to help logarithmically and evenly set bucket boundaries when the range of the 当groupBy expression scales exponentially.groupBy表达式的范围按指数缩放时,它们还可以用于帮助以对数和均匀方式设置桶边界。
The Renard number series are sets of numbers derived by taking either the 5 th, 10 th, 20 th, 40 th, or 80 th root of 10, then including various powers of the root that equate to values between 1.0 to 10.0 (10.3 in the case of 雷诺数系列是通过取10的第5、第10、第20、第40或第80个根,然后包括根的各种幂,这些幂等于1.0到10.0之间的值(R80).R80为10.3)。
Set granularity to R5, R10, R20, R40, or R80 to restrict bucket boundaries to values in the series. The values of the series are multiplied by a power of 10 when the 当groupBy values are outside of the 1.0 to 10.0 (10.3 for R80) range.groupBy值超出1.0到10.0(R80为10.3)范围时,该系列的值乘以10的幂。
Example实例
The R5 series is based off of the fifth root of 10, which is 1.58, and includes various powers of this root (rounded) until 10 is reached. R5系列以10的第五个根为基础,即1.58,包括这个根的各种幂(四舍五入),直到达到10为止。The R5 series is derived as follows:R5系列的推导如下:
The same approach is applied to the other Renard series to offer finer granularity, i.e., more intervals between 1.0 and 10.0 (10.3 for 同样的方法也适用于其他雷诺系列,以提供更细的粒度,即1.0和10.0之间的更多间隔(R80).R80为10.3)。
The E number series are similar to the Renard series in that they subdivide the interval from 1.0 to 10.0 by the 6 th, 12 th, 24 th, 48 th, 96 th, or 192 nd root of ten with a particular relative error.
Set granularity to E6, E12, E24, E48, E96, or E192 to restrict bucket boundaries to values in the series. The values of the series are multiplied by a power of 10 when the groupBy values are outside of the 1.0 to 10.0 range. To learn more about the E-series and their respective relative errors, see preferred number series.
The 1-2-5 series behaves like a three-value Renard series, if such a series existed.
Set granularity to 1-2-5 to restrict bucket boundaries to various powers of the third root of 10, rounded to one significant digit.
Example
The following values are part of the 以下数值是1-2-5 series:1-2-5系列的一部分:
0.1, 0.2, 0.5, 1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, and so on…
Set 将granularity to POWERSOF2 to restrict bucket boundaries to numbers that are a power of two.granularity设置为POWERSOF2,将桶边界限制为2的幂。
Example
The following numbers adhere to the power of two Series:以下数字表示两个系列的幂:
A common implementation is how various computer components, like memory, often adhere to the 一种常见的实现方式是,各种计算机组件(如内存)通常遵循首选数字的POWERSOF2 set of preferred numbers:POWERSOF2集:
1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, and so on….
The following operation demonstrates how specifying different values for 下面的操作演示了为granularity affects how $bucketAuto determines bucket boundaries. granularity指定不同的值如何影响$bucketAuto确定桶边界的方式。A collection of 一组things have an _id numbered from 1 to 100:things的_id编号从1到100:
Different values for granularity are substituted into the following operation:granularity的不同值被替换为以下操作:
The results in the following table demonstrate how different values for 下表中的结果显示了不同的granularity yield different bucket boundaries:granularity值如何产生不同的桶边界:
| Granularity | ||
|---|---|---|
| No granularity | { “_id” : { “min” : 0, “max” : 20 }, “count” : 20 }
{ “_id” : { “min” : 20, “max” : 40 }, “count” : 20 }
{ “_id” : { “min” : 40, “max” : 60 }, “count” : 20 }
{ “_id” : { “min” : 60, “max” : 80 }, “count” : 20 }
{ “_id” : { “min” : 80, “max” : 99 }, “count” : 20 } |
|
| R20 | { “_id” : { “min” : 0, “max” : 20 }, “count” : 20 }
{ “_id” : { “min” : 20, “max” : 40 }, “count” : 20 }
{ “_id” : { “min” : 40, “max” : 63 }, “count” : 23 }
{ “_id” : { “min” : 63, “max” : 90 }, “count” : 27 }
{ “_id” : { “min” : 90, “max” : 100 }, “count” : 10 } |
|
| E24 | { “_id” : { “min” : 0, “max” : 20 }, “count” : 20 }
{ “_id” : { “min” : 20, “max” : 43 }, “count” : 23 }
{ “_id” : { “min” : 43, “max” : 68 }, “count” : 25 }
{ “_id” : { “min” : 68, “max” : 91 }, “count” : 23 }
{ “_id” : { “min” : 91, “max” : 100 }, “count” : 9 } |
|
| 1-2-5 | { “_id” : { “min” : 0, “max” : 20 }, “count” : 20 }
{ “_id” : { “min” : 20, “max” : 50 }, “count” : 30 }
{ “_id” : { “min” : 50, “max” : 100 }, “count” : 50 } |
|
| POWERSOF2 | { “_id” : { “min” : 0, “max” : 32 }, “count” : 32 }
{ “_id” : { “min” : 32, “max” : 64 }, “count” : 32 }
{ “_id” : { “min” : 64, “max” : 128 }, “count” : 36 } |
Consider a collection 考虑一个具有以下文档的集合artwork with the following documents:artwork:
In the following operation, input documents are grouped into four buckets according to the values in the 在以下操作中,根据price field:price字段中的值将输入文档分为四个存储桶:
The operation returns the following documents:该操作将返回以下文档:
The $bucketAuto stage can be used within the $facet stage to process multiple aggregation pipelines on the same set of input documents from artwork.$bucketAuto阶段可以在$facet阶段中使用,以处理来自artwork的同一组输入文档上的多个聚合管道。
The following aggregation pipeline groups the documents from the 以下聚合管道根据artwork collection into buckets based on price, year, and the calculated area:price、year和经计算的area将artwork集合中的文档分组到桶中:
The operation returns the following document:该操作将返回以下文档: