$group (aggregation)

On this page本页内容

Definition定义

$group

Groups input documents by the specified _id expression and for each distinct grouping, outputs a document. 按指定的_id表达式对输入文档进行分组,对于每个不同的分组,输出一个文档。The _id field of each output document contains the unique group by value. 每个输出文档的_id字段包含唯一的分组依据值。The output documents can also contain computed fields that hold the values of some accumulator expression.输出文档还可以包含保存某些累加器表达式的值的计算字段。

Note

$group does not order its output documents.$group不为其输出文档排序。

The $group stage has the following prototype form:$group阶段具有以下原型形式:

{
  $group:
    {
      _id: <expression>, // Group By Expression
      <field1>: { <accumulator1> : <expression1> },
      ...
    }
 }
Field字段Description描述
_id Required.必要。 If you specify an _id value of null, or any other constant value, the $group stage calculates accumulated values for all the input documents as a whole. 如果将_id值指定为null或任何其他常量值,$group阶段将计算所有输入文档作为一个整体的累积值。See example of Group by Null.请参见Group by Null示例
field Optional.可选。 Computed using the accumulator operators.使用累加器运算符计算。

The _id and the accumulator operators can accept any valid expression. _id累加器运算符可以接受任何有效表达式。For more information on expressions, see Expressions.有关表达式的详细信息,请参阅表达式

Considerations考虑事项

Accumulator Operator累加器运算符

The <accumulator> operator must be one of the following accumulator operators:这个<accumulator>运算符必须是下列累加器运算符之一:

Name名称Description描述
$accumulator Returns the result of a user-defined accumulator function.返回用户定义的累加器函数的结果。
$addToSet Returns an array of unique expression values for each group. 返回每个组的唯一表达式值数组。Order of the array elements is undefined.数组元素的顺序未定义。
$avg Returns an average of numerical values. 返回数值的平均值。Ignores non-numeric values.忽略非数值。
$first

Returns a value from the first document for each group. 从每个组的第一个文档返回一个值。Order is only defined if the documents are in a defined order.仅当文档的顺序已定义时,才定义顺序。

Distinct from the $first array operator.$first数组运算符不同。

$last

Returns a value from the last document for each group. Order is only defined if the documents are in a defined order.从每个组的最后一个文档返回一个值。仅当文档的顺序已定义时,才定义顺序。

Distinct from the $last array operator.$last数组运算符不同。

$max Returns the highest expression value for each group.返回每个组的最高表达式值。
$mergeObjects Returns a document created by combining the input documents for each group.返回通过组合每个组的输入文档创建的文档。
$min Returns the lowest expression value for each group.返回每个组的最低表达式值。
$push Returns an array of expression values for each group.返回每个组的表达式值数组。
$stdDevPop Returns the population standard deviation of the input values.返回输入值的总体标准偏差。
$stdDevSamp Returns the sample standard deviation of the input values.返回输入值的样本标准偏差。
$sum Returns a sum of numerical values. Ignores non-numeric values.返回数值的总和。忽略非数值。

$group Operator and Memory运算符和内存

The $group stage has a limit of 100 megabytes of RAM. $group阶段的RAM限制为100 MB。By default, if the stage exceeds this limit, $group returns an error. 默认情况下,如果阶段超过此限制,$group将返回一个错误。To allow for the handling of large datasets, set the allowDiskUse option to true. 要允许处理大型数据集,请将allowDiskUse选项设置为trueThis flag enables $group operations to write to temporary files. 此标志允许$group操作写入临时文件。For more information, see the db.collection.aggregate() method and the aggregate command.有关更多信息,请参阅db.collection.aggregate()方法和aggregate命令。

Optimization to Return the First Document of Each Group返回每个组的第一个文档的优化

If a pipeline sorts and groups by the same field and the $group stage only uses the $first accumulator operator, consider adding an index on the grouped field which matches the sort order. 如果管道按相同的字段排序和分组,而$group阶段只使用$first累加器运算符,则考虑在与排序顺序相匹配的分组字段中添加索引In some cases, the $group stage can use the index to quickly find the first document of each group.在某些情况下,$group阶段可以使用索引快速查找每个组的第一个文档。

Example例子

If a collection named foo contains an index { x: 1, y: 1 }, the following pipeline can use that index to find the first document of each group:如果名为foo的集合包含索引{ x: 1, y: 1 },则以下管道可以使用该索引查找每个组的第一个文档:

db.foo.aggregate([
  {
    $sort:{ x : 1, y : 1 }
  },
  {
    $group: {
      _id: { x : "$x" },
      y: { $first : "$y" }
    }
  }
])

Examples示例

Count the Number of Documents in a Collection计算集合中的文档数

From the mongo shell, create a sample collection named sales with the following documents:mongo shell中,创建一个名为sales的样本集合,其中包含以下文档:

db.sales.insertMany([
  { "_id" : 1, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("2"), "date" : ISODate("2014-03-01T08:00:00Z") },
  { "_id" : 2, "item" : "jkl", "price" : NumberDecimal("20"), "quantity" : NumberInt("1"), "date" : ISODate("2014-03-01T09:00:00Z") },
  { "_id" : 3, "item" : "xyz", "price" : NumberDecimal("5"), "quantity" : NumberInt( "10"), "date" : ISODate("2014-03-15T09:00:00Z") },
  { "_id" : 4, "item" : "xyz", "price" : NumberDecimal("5"), "quantity" :  NumberInt("20") , "date" : ISODate("2014-04-04T11:21:39.736Z") },
  { "_id" : 5, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("10") , "date" : ISODate("2014-04-04T21:23:13.331Z") },
  { "_id" : 6, "item" : "def", "price" : NumberDecimal("7.5"), "quantity": NumberInt("5" ) , "date" : ISODate("2015-06-04T05:08:13Z") },
  { "_id" : 7, "item" : "def", "price" : NumberDecimal("7.5"), "quantity": NumberInt("10") , "date" : ISODate("2015-09-10T08:43:00Z") },
  { "_id" : 8, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("5" ) , "date" : ISODate("2016-02-06T20:20:13Z") },
])

The following aggregation operation uses the $group stage to count the number of documents in the sales collection:以下聚合操作使用$group阶段统计sales集合中的文档数:

db.sales.aggregate( [
  {
    $group: {
       _id: null,
       count: { $sum: 1 }
    }
  }
] )

The operation returns the following result:操作返回以下结果:

{ "_id" : null, "count" : 8 }

This aggregation operation is equivalent to the following SQL statement:此聚合操作相当于以下SQL语句:

SELECT COUNT(*) AS count FROM sales

See also参阅

Retrieve Distinct Values检索不同的值

The following aggregation operation uses the $group stage to retrieve the distinct item values from the sales collection:以下聚合操作使用$group阶段从sales集合检索不同的项目值:

db.sales.aggregate( [ { $group : { _id : "$item" } } ] )

The operation returns the following result:操作返回以下结果:

{ "_id" : "abc" }
{ "_id" : "jkl" }
{ "_id" : "def" }
{ "_id" : "xyz" }

Group by Item Having按项目分组

The following aggregation operation groups documents by the item field, calculating the total sale amount per item and returning only the items with total sale amount greater than or equal to 100:以下聚合操作按item字段对文档进行分组,计算每个项目的总销售金额,并仅返回总销售金额大于或等于100的项目:

db.sales.aggregate(
  [
    // First Stage
    {
      $group :
        {
          _id : "$item",
          totalSaleAmount: { $sum: { $multiply: [ "$price", "$quantity" ] } }
        }
     },
     // Second Stage
     {
       $match: { "totalSaleAmount": { $gte: 100 } }
     }
   ]
 )
First Stage:第一阶段:
The $group stage groups the documents by item to retrieve the distinct item values. $group阶段按item对文档进行分组,以检索不同的项值。This stage returns the totalSaleAmount for each item.此阶段返回每个项目的totalSaleAmount
Second Stage:第二阶段:
The $match stage filters the resulting documents to only return items with a totalSaleAmount greater than or equal to 100.$match阶段筛选生成的文档,以仅返回totalSaleAmount大于或等于100的项目。

The operation returns the following result:操作返回以下结果:

{ "_id" : "abc", "totalSaleAmount" : NumberDecimal("170") }
{ "_id" : "xyz", "totalSaleAmount" : NumberDecimal("150") }
{ "_id" : "def", "totalSaleAmount" : NumberDecimal("112.5") }

This aggregation operation is equivalent to the following SQL statement:此聚合操作相当于以下SQL语句:

SELECT item,
   Sum(( price * quantity )) AS totalSaleAmount
FROM   sales
GROUP  BY item
HAVING totalSaleAmount >= 100

See also参阅

Calculate Count, Sum, and Average计算计数、总和和平均值

From the mongo shell, create a sample collection named sales with the following documents:mongo shell中,创建一个名为sales的样本集合,其中包含以下文档:

db.sales.insertMany([
  { "_id" : 1, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("2"), "date" : ISODate("2014-03-01T08:00:00Z") },
  { "_id" : 2, "item" : "jkl", "price" : NumberDecimal("20"), "quantity" : NumberInt("1"), "date" : ISODate("2014-03-01T09:00:00Z") },
  { "_id" : 3, "item" : "xyz", "price" : NumberDecimal("5"), "quantity" : NumberInt( "10"), "date" : ISODate("2014-03-15T09:00:00Z") },
  { "_id" : 4, "item" : "xyz", "price" : NumberDecimal("5"), "quantity" :  NumberInt("20") , "date" : ISODate("2014-04-04T11:21:39.736Z") },
  { "_id" : 5, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("10") , "date" : ISODate("2014-04-04T21:23:13.331Z") },
  { "_id" : 6, "item" : "def", "price" : NumberDecimal("7.5"), "quantity": NumberInt("5" ) , "date" : ISODate("2015-06-04T05:08:13Z") },
  { "_id" : 7, "item" : "def", "price" : NumberDecimal("7.5"), "quantity": NumberInt("10") , "date" : ISODate("2015-09-10T08:43:00Z") },
  { "_id" : 8, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("5" ) , "date" : ISODate("2016-02-06T20:20:13Z") },
])

Group by Day of the Year按年度日期分组

The following pipeline calculates the total sales amount, average sales quantity, and sale count for each day in the year 2014:以下管道计算2014年每天的总销售额、平均销售额和销售额:

db.sales.aggregate([
  // First Stage
  {
    $match : { "date": { $gte: new ISODate("2014-01-01"), $lt: new ISODate("2015-01-01") } }
  },
  // Second Stage
  {
    $group : {
       _id : { $dateToString: { format: "%Y-%m-%d", date: "$date" } },
       totalSaleAmount: { $sum: { $multiply: [ "$price", "$quantity" ] } },
       averageQuantity: { $avg: "$quantity" },
       count: { $sum: 1 }
    }
  },
  // Third Stage
  {
    $sort : { totalSaleAmount: -1 }
  }
 ])
First Stage:第一阶段:
The $match stage filters the documents to only pass documents from the year 2014 to the next stage.$match阶段筛选文档,仅将2014年的文档传递到下一阶段。
Second Stage:第二阶段:
The $group stage groups the documents by date and calculates the total sale amount, average quantity, and total count of the documents in each group.$group阶段按日期对文档进行分组,并计算每组文档的总销售额、平均数量和总计数。
Third Stage:第三阶段:
The $sort stage sorts the results by the total sale amount for each group in descending order.$sort阶段按每组的总销售额降序对结果进行排序。

The operation returns the following results:操作返回以下结果:

{ "_id" : "2014-04-04", "totalSaleAmount" : NumberDecimal("200"), "averageQuantity" : 15, "count" : 2 }
{ "_id" : "2014-03-15", "totalSaleAmount" : NumberDecimal("50"), "averageQuantity" : 10, "count" : 1 }
{ "_id" : "2014-03-01", "totalSaleAmount" : NumberDecimal("40"), "averageQuantity" : 1.5, "count" : 2 }

This aggregation operation is equivalent to the following SQL statement:此聚合操作相当于以下SQL语句:

SELECT date,
       Sum(( price * quantity )) AS totalSaleAmount,
       Avg(quantity)             AS averageQuantity,
       Count(*)                  AS Count
FROM   sales
GROUP  BY Date(date)
ORDER  BY totalSaleAmount DESC

See also参阅

Group by nullnull分组

The following aggregation operation specifies a group _id of null, calculating the total sale amount, average quantity, and count of all documents in the collection.以下聚合操作指定组_idnull,用于计算集合中所有文档的总销售额、平均数量和计数。

db.sales.aggregate([
  {
    $group : {
       _id : null,
       totalSaleAmount: { $sum: { $multiply: [ "$price", "$quantity" ] } },
       averageQuantity: { $avg: "$quantity" },
       count: { $sum: 1 }
    }
  }
 ])

The operation returns the following result:操作返回以下结果:

{
  "_id" : null,
  "totalSaleAmount" : NumberDecimal("452.5"),
  "averageQuantity" : 7.875,
  "count" : 8
}

This aggregation operation is equivalent to the following SQL statement:此聚合操作相当于以下SQL语句:

SELECT Sum(price * quantity) AS totalSaleAmount,
       Avg(quantity)         AS averageQuantity,
       Count(*)              AS Count
FROM   sales

See also参阅

Pivot Data数据透视

From the mongo shell, create a sample collection named books with the following documents:mongo shell创建一个名为books的示例集合,其中包含以下文档:

db.books.insertMany([
  { "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 },
  { "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 },
  { "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 },
  { "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 },
  { "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 }
])

Group title by authorauthor分组title

The following aggregation operation pivots the data in the books collection to have titles grouped by authors.下面的聚合操作将books集合中的数据透视为按作者分组的题名。

db.books.aggregate([
   { $group : { _id : "$author", books: { $push: "$title" } } }
 ])

The operation returns the following documents:该操作将返回以下文档:

{ "_id" : "Homer", "books" : [ "The Odyssey", "Iliad" ] }
{ "_id" : "Dante", "books" : [ "The Banquet", "Divine Comedy", "Eclogues" ] }

Group Documents by authorauthor分组文档

The following aggregation operation groups documents by author:以下聚合操作按author对文档进行分组:

db.books.aggregate([
   // First Stage
   {
     $group : { _id : "$author", books: { $push: "$$ROOT" } }
   },
   // Second Stage
   {
     $addFields:
       {
         totalCopies : { $sum: "$books.copies" }
       }
   }
 ])
First Stage:第一阶段:

$group uses the $$ROOT system variable to group the entire documents by authors. 使用$$ROOT系统变量按作者对整个文档进行分组。This stage passes the following documents to the next stage:此阶段将以下文档传递到下一阶段:

{ "_id" : "Homer",
  "books" :
    [
       { "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 },
       { "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 }
    ]
 },
 { "_id" : "Dante",
   "books" :
     [
       { "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 },
       { "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 },
       { "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 }
     ]
 }
Second Stage:第二阶段:

$addFields adds a field to the output containing the total copies of books for each author.向输出中添加一个字段,其中包含每个作者的图书总拷贝数。

Note

The resulting documents must not exceed the BSON Document Size limit of 16 megabytes.生成的文档不得超过BSON文档大小限制(16 MB)。

The operation returns the following documents:该操作将返回以下文档:

{
  "_id" : "Homer",
  "books" :
     [
       { "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 },
       { "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 }
     ],
   "totalCopies" : 20
}

{
  "_id" : "Dante",
  "books" :
     [
       { "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 },
       { "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 },
       { "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 }
     ],
   "totalCopies" : 5
}

See also参阅

Additional Resources额外资源

The Aggregation with the Zip Code Data Set tutorial provides an extensive example of the $group operator in a common use case.“Zip编码数据集聚合”教程提供了一个常见用例中$group运算符的广泛示例。