$bucket (aggregation)

On this page本页内容

Definition定义

$bucket

New in version 3.4.版本3.4中的新功能。

Categorizes incoming documents into groups, called buckets, based on a specified expression and bucket boundaries and outputs a document per each bucket. 根据指定的表达式和存储桶边界,将传入文档分类为组(称为存储桶),并为每个存储桶输出一个文档。Each output document contains an _id field whose value specifies the inclusive lower bound of the bucket. 每个输出文档都包含一个_id字段,其值指定bucket的包含下限。The output option specifies the fields included in each output document.output选项指定每个输出文档中包含的字段。

$bucket only produces output documents for buckets that contain at least one input document.仅为至少包含一个输入文档的存储桶生成输出文档。

Syntax语法

{
  $bucket: {
      groupBy: <expression>,
      boundaries: [ <lowerbound1>, <lowerbound2>, ... ],
      default: <literal>,
      output: {
         <output1>: { <$accumulator expression> },
         ...
         <outputN>: { <$accumulator expression> }
      }
   }
}

The $bucket document contains the following fields:$bucket文档包含以下字段:

Field字段Type类型Description描述
groupBy expression

An expression to group documents by. 文档分组依据的表达式To specify a field path, prefix the field name with a dollar sign $ and enclose it in quotes.要指定字段路径,请在字段名称前面加上美元符号$,并用引号括起来。

Unless $bucket includes a default specification, each input document must resolve the groupBy field path or expression to a value that falls within one of the ranges specified by the boundaries.除非$bucket包含default规范,否则每个输入文档必须将groupBy字段路径或表达式解析为一个值,该值在边界指定的范围内。

boundaries array

An array of values based on the groupBy expression that specify the boundaries for each bucket. 基于groupBy表达式的值数组,用于指定每个bucket的边界。Each adjacent pair of values acts as the inclusive lower boundary and the exclusive upper boundary for the bucket. 每个相邻的值对充当桶的包含下边界和独占上边界。You must specify at least two boundaries.必须至少指定两个边界。

The specified values must be in ascending order and all of the same type. 指定的值必须按升序排列,且类型相同。The exception is if the values are of mixed numeric types, such as:例外情况是,如果值是混合数字类型,例如:

[ 10, NumberLong(20), NumberInt(30) ]

Example

An array of [ 0, 5, 10 ] creates two buckets:[0, 5, 10]的数组创建两个桶:

  • [0, 5) with inclusive lower bound 0 and exclusive upper bound 5.具有包含下限0和排除上限5
  • [5, 10) with inclusive lower bound 5 and exclusive upper bound 10.具有包含下限5和排除上限10
default literal字面量

Optional. 可选择的A literal that specifies the _id of an additional bucket that contains all documents whose groupBy expression result does not fall into a bucket specified by boundaries.一个文本,指定一个额外存储桶的_id,该存储桶包含groupBy表达式结果不属于boundaries指定的存储桶的所有文档。

If unspecified, each input document must resolve the groupBy expression to a value within one of the bucket ranges specified by boundaries or the operation throws an error.如果未指定,则每个输入文档必须将groupBy表达式解析为boundaries指定的某个存储桶范围内的值,否则操作将抛出错误。

The default value must be less than the lowest boundaries value, or greater than or equal to the highest boundaries value.default必须小于最低boundaries值,或大于或等于最高boundaries值。

The default value can be of a different type than the entries in boundaries.default值的类型可以与boundaries中的条目不同。

output document

Optional. 可选择的A document that specifies the fields to include in the output documents in addition to the _id field. 除了_id字段之外,还指定要包含在输出文档中的字段的文档。To specify the field to include, you must use accumulator expressions.要指定要包含的字段,必须使用累加器表达式

<outputfield1>: { <accumulator>: <expression1> },
...
<outputfieldN>: { <accumulator>: <expressionN> }

If you do not specify an output document, the operation returns a count field containing the number of documents in each bucket.如果未指定output文档,操作将返回一个count字段,其中包含每个存储桶中的文档数。

If you specify an output document, only the fields specified in the document are returned; i.e. the count field is not returned unless it is explicitly included in the output document.如果指定output文档,则只返回文档中指定的字段;亦即,除非output文档中明确包含count字段,否则不会返回该字段。

Behavior行为

$bucket requires at least one of the following conditions to be met or the operation throws an error:要求至少满足以下条件之一,否则操作引发错误:

If the groupBy expression resolves to an array or a document, $bucket arranges the input documents into buckets using the comparison logic from $sort.如果groupBy表达式解析为数组或文档,$bucket使用$sort中的比较逻辑将输入文档排列到bucket中。

Examples示例

Bucket by Year and Filter by Bucket Results每年一桶,按桶筛选结果

From the mongo shell, create a sample collection named artists with the following documents:mongo shell中,创建一个名为artists的样本集合,并附带以下文档:

db.artists.insertMany([
  { "_id" : 1, "last_name" : "Bernard", "first_name" : "Emil", "year_born" : 1868, "year_died" : 1941, "nationality" : "France" },
  { "_id" : 2, "last_name" : "Rippl-Ronai", "first_name" : "Joszef", "year_born" : 1861, "year_died" : 1927, "nationality" : "Hungary" },
  { "_id" : 3, "last_name" : "Ostroumova", "first_name" : "Anna", "year_born" : 1871, "year_died" : 1955, "nationality" : "Russia" },
  { "_id" : 4, "last_name" : "Van Gogh", "first_name" : "Vincent", "year_born" : 1853, "year_died" : 1890, "nationality" : "Holland" },
  { "_id" : 5, "last_name" : "Maurer", "first_name" : "Alfred", "year_born" : 1868, "year_died" : 1932, "nationality" : "USA" },
  { "_id" : 6, "last_name" : "Munch", "first_name" : "Edvard", "year_born" : 1863, "year_died" : 1944, "nationality" : "Norway" },
  { "_id" : 7, "last_name" : "Redon", "first_name" : "Odilon", "year_born" : 1840, "year_died" : 1916, "nationality" : "France" },
  { "_id" : 8, "last_name" : "Diriks", "first_name" : "Edvard", "year_born" : 1855, "year_died" : 1930, "nationality" : "Norway" }
])

The following operation groups the documents into buckets according to the year_born field and filters based on the count of documents in the buckets:以下操作根据year_born字段将文档分组到存储桶中,并根据存储桶中的文档计数进行筛选:

db.artists.aggregate( [
  // First Stage
  {
    $bucket: {
      groupBy: "$year_born",                        // Field to group by
      boundaries: [ 1840, 1850, 1860, 1870, 1880 ], // Boundaries for the buckets
      default: "Other",                             // Bucket id for documents which do not fall into a bucket
      output: {                                     // Output for each bucket
        "count": { $sum: 1 },
        "artists" :
          {
            $push: {
              "name": { $concat: [ "$first_name", " ", "$last_name"] },
              "year_born": "$year_born"
            }
          }
      }
    }
  },
  // Second Stage
  {
    $match: { count: {$gt: 3} }
  }
] )
First Stage第一阶段

The $bucket stage groups the documents into buckets by the year_born field. $bucket阶段按year_born字段将文档分组到存储桶中。The buckets have the following boundaries:存储桶具有以下boundaries

  • [1840, 1850) with inclusive lowerbound 1840 and exclusive upper bound 1850.
  • [1850, 1860) with inclusive lowerbound 1850 and exclusive upper bound 1860.
  • [1860, 1870) with inclusive lowerbound 1860 and exclusive upper bound 1870.
  • [1870, 1880) with inclusive lowerbound 1870 and exclusive upper bound 1880.
  • If a document did not contain the year_born field or its year_born field was outside the ranges above, it would be placed in the default bucket with the _id value "Other".

The stage includes the output document to determine the fields to return:该阶段包括输出文档,用于确定要返回的字段:

_id Inclusive lower bound of the bucket.包含桶的下限。
count Count of documents in the bucket.清点桶里的文件。
artists

Array of documents containing information on each artist in the bucket. 包含桶中每个艺术家信息的文档数组。Each document contains the artist’s每个文档都包含艺术家的

  • name, which is a concatenation (i.e. $concat) of the artist’s first_name and last_name.name,是艺术家的名字和姓氏的串联(即$concat)。
  • year_born

This stage passes the following documents to the next stage:本阶段将以下文件传递到下一阶段:

{ "_id" : 1840, "count" : 1, "artists" : [ { "name" : "Odilon Redon", "year_born" : 1840 } ] }

{ "_id" : 1850, "count" : 2, "artists" : [ { "name" : "Vincent Van Gogh", "year_born" : 1853 },
                                           { "name" : "Edvard Diriks", "year_born" : 1855 } ] }

{ "_id" : 1860, "count" : 4, "artists" : [ { "name" : "Emil Bernard", "year_born" : 1868 },
                                           { "name" : "Joszef Rippl-Ronai", "year_born" : 1861 },
                                           { "name" : "Alfred Maurer", "year_born" : 1868 },
                                           { "name" : "Edvard Munch", "year_born" : 1863 } ] }

{ "_id" : 1870, "count" : 1, "artists" : [ { "name" : "Anna Ostroumova", "year_born" : 1871 } ] }
Second Stage第二阶段

The $match stage filters the output from the previous stage to only return buckets which contain more than 3 documents.$match阶段筛选前一阶段的输出,只返回包含3个以上文档的存储桶。

The operation returns the following document:该操作将返回以下文档:

{ "_id" : 1860, "count" : 4, "artists" :
  [
    { "name" : "Emil Bernard", "year_born" : 1868 },
    { "name" : "Joszef Rippl-Ronai", "year_born" : 1861 },
    { "name" : "Alfred Maurer", "year_born" : 1868 },
    { "name" : "Edvard Munch", "year_born" : 1863 }
  ]
}

Use $bucket with $facet to Bucket by Multiple Fields通过多个字段对存储桶使用$bucket和$facet

You can use the $facet stage to perform multiple $bucket aggregations in a single stage.可以使用$facet阶段在单个阶段中执行多个$bucket聚合。

From the mongo shell, create a sample collection named artwork with the following documents:mongo shell中,创建一个名为artwork的样本集合,其中包含以下文档:

db.artwork.insertMany([
  { "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926,
      "price" : NumberDecimal("199.99") },
  { "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902,
      "price" : NumberDecimal("280.00") },
  { "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925,
      "price" : NumberDecimal("76.04") },
  { "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai",
      "price" : NumberDecimal("167.30") },
  { "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931,
      "price" : NumberDecimal("483.00") },
  { "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913,
      "price" : NumberDecimal("385.00") },
  { "_id" : 7, "title" : "The Scream", "artist" : "Munch", "year" : 1893
      /* No price*/ },
  { "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918,
      "price" : NumberDecimal("118.42") }
])

The following operation uses two $bucket stages within a $facet stage to create two groupings, one by price and the other by year:以下操作使用$facet阶段中的两个$bucket阶段创建两个分组,一个按price,另一个按year

db.artwork.aggregate( [
  {
    $facet: {                               // Top-level $facet stage
      "price": [                            // Output field 1
        {
          $bucket: {
              groupBy: "$price",            // Field to group by
              boundaries: [ 0, 200, 400 ],  // Boundaries for the buckets
              default: "Other",             // Bucket id for documents which do not fall into a bucket
              output: {                     // Output for each bucket
                "count": { $sum: 1 },
                "artwork" : { $push: { "title": "$title", "price": "$price" } },
                "averagePrice": { $avg: "$price" }
              }
          }
        }
      ],
      "year": [                                      // Output field 2
        {
          $bucket: {
            groupBy: "$year",                        // Field to group by
            boundaries: [ 1890, 1910, 1920, 1940 ],  // Boundaries for the buckets
            default: "Unknown",                      // Bucket id for documents which do not fall into a bucket
            output: {                                // Output for each bucket
              "count": { $sum: 1 },
              "artwork": { $push: { "title": "$title", "year": "$year" } }
            }
          }
        }
      ]
    }
  }
] )
First Facet

The first facet groups the input documents by price. 第一个方面按price对输入文档进行分组。The buckets have the following boundaries:铲斗具有以下边界:

  • [0, 200) with inclusive lowerbound 0 and exclusive upper bound 200.
  • [200, 400) with inclusive lowerbound 200 and exclusive upper bound 400.
  • “Other”, the default bucket containing documents without prices or prices outside the ranges above.

The $bucket stage includes the output document to determine the fields to return:

_id Inclusive lower bound of the bucket.包含桶的下限。
count Count of documents in the bucket.清点桶里的文件。
artwork Array of documents containing information on each artwork in the bucket.包含桶中每个艺术品信息的文档数组。
averagePrice Employs the $avg operator to display the average price of all artwork in the bucket.雇佣$avg运算符显示桶中所有艺术品的平均价格。
Second Facet第二方面

The second facet groups the input documents by year. 第二个方面按year对输入文档进行分组。The buckets have the following boundaries:铲斗具有以下边界:

  • [1890, 1910) with inclusive lowerbound 1890 and exclusive upper bound 1910.具有包括1890年的下限和排除1910年的上限。
  • [1910, 1920) with inclusive lowerbound 1910 and exclusive upper bound 1920.具有包括1910年的下限和排除1920年的上限。
  • [1920, 1940) with inclusive lowerbound 1910 and exclusive upper bound 1940.具有包括1910年的下限和排除1940年的上限。
  • “Unknown”, the default bucket containing documents without years or years outside the ranges above.“Unknown”,default存储桶中包含的文档没有年份或年份超出上述范围。

The $bucket stage includes the output document to determine the fields to return:$bucket阶段包括用于确定要返回的字段的输出文档:

count Count of documents in the bucket.清点桶里的文件。
artwork Array of documents containing information on each artwork in the bucket.包含桶中每个艺术品信息的文档数组。
Output

The operation returns the following document:该操作将返回以下文档:

{
  "price" : [ // Output of first facet
    {
      "_id" : 0,
      "count" : 4,
      "artwork" : [
        { "title" : "The Pillars of Society", "price" : NumberDecimal("199.99") },
        { "title" : "Dancer", "price" : NumberDecimal("76.04") },
        { "title" : "The Great Wave off Kanagawa", "price" : NumberDecimal("167.30") },
        { "title" : "Blue Flower", "price" : NumberDecimal("118.42") }
      ],
      "averagePrice" : NumberDecimal("140.4375")
    },
    {
      "_id" : 200,
      "count" : 2,
      "artwork" : [
        { "title" : "Melancholy III", "price" : NumberDecimal("280.00") },
        { "title" : "Composition VII", "price" : NumberDecimal("385.00") }
      ],
      "averagePrice" : NumberDecimal("332.50")
    },
    {
      // Includes documents without prices and prices greater than 400
      "_id" : "Other",
      "count" : 2,
      "artwork" : [
        { "title" : "The Persistence of Memory", "price" : NumberDecimal("483.00") },
        { "title" : "The Scream" }
      ],
      "averagePrice" : NumberDecimal("483.00")
    }
  ],
  "year" : [ // Output of second facet
    {
      "_id" : 1890,
      "count" : 2,
      "artwork" : [
        { "title" : "Melancholy III", "year" : 1902 },
        { "title" : "The Scream", "year" : 1893 }
      ]
    },
    {
      "_id" : 1910,
      "count" : 2,
      "artwork" : [
        { "title" : "Composition VII", "year" : 1913 },
        { "title" : "Blue Flower", "year" : 1918 }
      ]
    },
    {
      "_id" : 1920,
      "count" : 3,
      "artwork" : [
        { "title" : "The Pillars of Society", "year" : 1926 },
        { "title" : "Dancer", "year" : 1925 },
        { "title" : "The Persistence of Memory", "year" : 1931 }
      ]
    },
    {
      // Includes documents without a year
      "_id" : "Unknown",
      "count" : 1,
      "artwork" : [
        { "title" : "The Great Wave off Kanagawa" }
      ]
    }
  ]
}

See also参阅

$bucketAuto