12.20.3 MySQL Handling of GROUP BYGROUP BY的MySQL处理

SQL-92 and earlier does not permit queries for which the select list, HAVING condition, or ORDER BY list refer to nonaggregated columns that are not named in the GROUP BY clause. SQL-92及更早版本不允许SELECT列表、HAVING条件或ORDER BY列表引用GROUP BY子句中未命名的未聚合列的查询。For example, this query is illegal in standard SQL-92 because the nonaggregated name column in the select list does not appear in the GROUP BY:例如,此查询在标准SQL-92中是非法的,因为SELECT列表中的未聚合的name列未显示在GROUP BY中:

SELECT o.custid, c.name, MAX(o.payment)
  FROM orders AS o, customers AS c
  WHERE o.custid = c.custid
  GROUP BY o.custid;

For the query to be legal in SQL-92, the name column must be omitted from the select list or named in the GROUP BY clause.要使查询在SQL-92中合法,必须从SELECT列表中省略name列或在GROUP BY子句中命名。

SQL:1999 and later permits such nonaggregates per optional feature T301 if they are functionally dependent on GROUP BY columns: If such a relationship exists between name and custid, the query is legal. 如果SQL:1999和更高版本在功能上依赖于GROUP BY列,则允许根据可选特性T301进行此类非聚集:如果namecustid之间存在此类关系,则查询是合法的。This would be the case, for example, were custid a primary key of customers.例如,如果custidcustomer的主键,情况就是这样。

MySQL implements detection of functional dependence. MySQL实现了函数依赖性的检测。If the ONLY_FULL_GROUP_BY SQL mode is enabled (which it is by default), MySQL rejects queries for which the select list, HAVING condition, or ORDER BY list refer to nonaggregated columns that are neither named in the GROUP BY clause nor are functionally dependent on them.如果启用了ONLY_FULL_GROUP_BYSQL模式(默认情况下是这样),MySQL将拒绝SELECT列表、HAVING条件或ORDER BY列表引用的非聚合列,这些列既不在GROUP BY子句中命名,也不在功能上依赖于它们。

MySQL also permits a nonaggregate column not named in a GROUP BY clause when SQL ONLY_FULL_GROUP_BY mode is enabled, provided that this column is limited to a single value, as shown in the following example:MySQL还允许在启用ONLY_FULL_GROUP_BYSQL模式时,在GROUP BY子句中不命名的非聚集列,前提是该列仅限于单个值,如下例所示:

mysql> CREATE TABLE mytable (
    ->    id INT UNSIGNED NOT NULL PRIMARY KEY,
    ->    a VARCHAR(10),
    ->    b INT
    -> );

mysql> INSERT INTO mytable
    -> VALUES (1, 'abc', 1000),
    ->        (2, 'abc', 2000),
    ->        (3, 'def', 4000);

mysql> SET SESSION sql_mode = sys.list_add(@@session.sql_mode, 'ONLY_FULL_GROUP_BY');

mysql> SELECT a, SUM(b) FROM mytable WHERE a = 'abc';
+------+--------+
| a    | SUM(b) |
+------+--------+
| abc  |   3000 |
+------+--------+

It is also possible to have more than one nonaggregate column in the SELECT list when employing ONLY_FULL_GROUP_BY. 当使用ONLY_FULL_GROUP_BY时,SELECT列表中也可能有多个非聚集列。In this case, every such column must be limited to a single value in the WHERE clause, and all such limiting conditions must be joined by logical AND, as shown here:在这种情况下,WHERE子句中的每一列都必须限制为一个值,并且所有此类限制条件必须由逻辑AND连接,如下所示:

mysql> DROP TABLE IF EXISTS mytable;

mysql> CREATE TABLE mytable (
    ->    id INT UNSIGNED NOT NULL PRIMARY KEY,
    ->    a VARCHAR(10),
    ->    b VARCHAR(10),
    ->    c INT
    -> );

mysql> INSERT INTO mytable
    -> VALUES (1, 'abc', 'qrs', 1000),
    ->        (2, 'abc', 'tuv', 2000),
    ->        (3, 'def', 'qrs', 4000),
    ->        (4, 'def', 'tuv', 8000),
    ->        (5, 'abc', 'qrs', 16000),
    ->        (6, 'def', 'tuv', 32000);

mysql> SELECT @@session.sql_mode;
+---------------------------------------------------------------+
| @@session.sql_mode                                            |
+---------------------------------------------------------------+
| ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION |
+---------------------------------------------------------------+

mysql> SELECT a, b, SUM(c) FROM mytable
    ->     WHERE a = 'abc' AND b = 'qrs';
+------+------+--------+
| a    | b    | SUM(c) |
+------+------+--------+
| abc  | qrs  |  17000 |
+------+------+--------+

If ONLY_FULL_GROUP_BY is disabled, a MySQL extension to the standard SQL use of GROUP BY permits the select list, HAVING condition, or ORDER BY list to refer to nonaggregated columns even if the columns are not functionally dependent on GROUP BY columns. 如果禁用了ONLY_FULL_GROUP_BY,则GROUP BY的标准SQL使用的MySQL扩展允许SELECT列表、HAVING条件或ORDER BY列表引用未聚合的列,即使这些列在功能上不依赖于GROUP BY列。This causes MySQL to accept the preceding query. 这使MySQL接受前面的查询。In this case, the server is free to choose any value from each group, so unless they are the same, the values chosen are nondeterministic, which is probably not what you want. 在这种情况下,服务器可以自由地从每个组中选择任何值,因此除非它们相同,否则选择的值是不确定的,这可能不是您想要的。Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. 此外,添加ORDER BY子句不能影响从每个组中选择值。Result set sorting occurs after values have been chosen, and ORDER BY does not affect which value within each group the server chooses. 结果集排序是在选择值之后进行的,ORDER BY不影响服务器在每个组中选择哪个值。Disabling ONLY_FULL_GROUP_BY is useful primarily when you know that, due to some property of the data, all values in each nonaggregated column not named in the GROUP BY are the same for each group.如果您知道由于数据的某些属性,每个未在GROUP BY中命名的非聚集列中的所有值对于每个组都是相同的,则禁用ONLY_FULL_GROUP_BY非常有用。

You can achieve the same effect without disabling ONLY_FULL_GROUP_BY by using ANY_VALUE() to refer to the nonaggregated column.通过使用ANY_VALUE()来引用非聚集列,您可以在不禁用ONLY_FULL_GROUP_BY的情况下实现相同的效果。

The following discussion demonstrates functional dependence, the error message MySQL produces when functional dependence is absent, and ways of causing MySQL to accept a query in the absence of functional dependence.下面的讨论将演示函数依赖性、当没有函数依赖性时MySQL产生的错误消息,以及在没有函数依赖性的情况下使MySQL接受查询的方法。

This query might be invalid with ONLY_FULL_GROUP_BY enabled because the nonaggregated address column in the select list is not named in the GROUP BY clause:如果启用了ONLY_FULL_GROUP_BY,则此查询可能无效,因为在GROUP BY子句中没有命名SELECT列表中的未聚合address列:

SELECT name, address, MAX(age) FROM t GROUP BY name;

The query is valid if name is a primary key of t or is a unique NOT NULL column. 如果namet的主键或是唯一的NOT NULL列,则查询有效。In such cases, MySQL recognizes that the selected column is functionally dependent on a grouping column. 在这种情况下,MySQL认识到所选列在功能上依赖于分组列。For example, if name is a primary key, its value determines the value of address because each group has only one value of the primary key and thus only one row. 例如,如果name是主键,则其值决定address的值,因为每个组只有一个主键值,因此只有一行。As a result, there is no randomness in the choice of address value in a group and no need to reject the query.因此,组中address值的选择没有随机性,也不需要拒绝查询。

The query is invalid if name is not a primary key of t or a unique NOT NULL column. 如果name不是t的主键也不是唯一的NOT NULL列,则查询无效。In this case, no functional dependency can be inferred and an error occurs:在这种情况下,无法推断函数依赖关系,并发生错误:

mysql> SELECT name, address, MAX(age) FROM t GROUP BY name;
ERROR 1055 (42000): Expression #2 of SELECT list is not in GROUP
BY clause and contains nonaggregated column 'mydb.t.address' which
is not functionally dependent on columns in GROUP BY clause; this
is incompatible with sql_mode=only_full_group_by

If you know that, for a given data set, each name value in fact uniquely determines the address value, address is effectively functionally dependent on name. 如果您知道,对于给定的数据集,每个name值实际上唯一地决定了address值,那么地址实际上在功能上依赖于名称。To tell MySQL to accept the query, you can use the ANY_VALUE() function:要让MySQL接受查询,可以使用ANY_VALUE()函数:

SELECT name, ANY_VALUE(address), MAX(age) FROM t GROUP BY name;

Alternatively, disable ONLY_FULL_GROUP_BY.或者,禁用ONLY_FULL_GROUP_BY

The preceding example is quite simple, however. 然而,前面的例子非常简单。In particular, it is unlikely you would group on a single primary key column because every group would contain only one row. 特别是,不太可能在单个主键列上分组,因为每个组只包含一行。For addtional examples demonstrating functional dependence in more complex queries, see Section 12.20.4, “Detection of Functional Dependence”.有关在更复杂查询中演示函数依赖性的其他示例,请参阅第12.20.4节,“函数依赖性的检测”

If a query has aggregate functions and no GROUP BY clause, it cannot have nonaggregated columns in the select list, HAVING condition, or ORDER BY list with ONLY_FULL_GROUP_BY enabled:如果查询具有聚合函数且没有GROUP BY子句,启用了ONLY_FULL_GROUP_BY的话,则在SELECT列表、HAVING条件或ORDER BY列表中不能有未聚合的列:

mysql> SELECT name, MAX(age) FROM t;
ERROR 1140 (42000): In aggregated query without GROUP BY, expression
#1 of SELECT list contains nonaggregated column 'mydb.t.name'; this
is incompatible with sql_mode=only_full_group_by

Without GROUP BY, there is a single group and it is nondeterministic which name value to choose for the group. 没有GROUP BY,就只有一个组,为组选择哪个名称值是不确定的。Here, too, ANY_VALUE() can be used, if it is immaterial which name value MySQL chooses:在这里,如果MySQL选择哪个name值无关紧要,那么也可以使用ANY_VALUE()

SELECT ANY_VALUE(name), MAX(age) FROM t;

ONLY_FULL_GROUP_BY also affects handling of queries that use DISTINCT and ORDER BY. ONLY_FULL_GROUP_BY也会影响对使用DISTINCTORDER BY的查询的处理。Consider the case of a table t with three columns c1, c2, and c3 that contains these rows:考虑一个表t的情况,它有三列c1c2c3,其中包含以下行:

c1 c2 c3
1  2  A
3  4  B
1  2  C

Suppose that we execute the following query, expecting the results to be ordered by c3:假设我们执行以下查询,期望结果按c3排序:

SELECT DISTINCT c1, c2 FROM t ORDER BY c3;

To order the result, duplicates must be eliminated first. But to do so, should we keep the first row or the third? 要排序结果,必须首先消除重复项。但要做到这一点,我们应该保留第一排还是第三排?This arbitrary choice influences the retained value of c3, which in turn influences ordering and makes it arbitrary as well. 这种任意的选择会影响c3的保留值,这反过来又会影响排序并使其具有任意性。To prevent this problem, a query that has DISTINCT and ORDER BY is rejected as invalid if any ORDER BY expression does not satisfy at least one of these conditions:为避免此问题,如果任何ORDER BY表达式不满足以下条件之一,则具有DISTINCTORDER BY的查询将被视为无效而拒绝:

Another MySQL extension to standard SQL permits references in the HAVING clause to aliased expressions in the select list. 标准SQL的另一个MySQL扩展允许HAVING子句中引用select列表中的别名表达式。For example, the following query returns name values that occur only once in table orders:例如,以下查询返回仅在表orders中出现一次的name值:

SELECT name, COUNT(name) FROM orders
  GROUP BY name
  HAVING COUNT(name) = 1;

The MySQL extension permits the use of an alias in the HAVING clause for the aggregated column:MySQL扩展允许在HAVING子句中为聚合列使用别名:

SELECT name, COUNT(name) AS c FROM orders
  GROUP BY name
  HAVING c = 1;

Standard SQL permits only column expressions in GROUP BY clauses, so a statement such as this is invalid because FLOOR(value/100) is a noncolumn expression:标准SQL只允许GROUP BY子句中的列表达式,因此这样的语句是无效的,因为FLOOR(value/100)是非列表达式:

SELECT id, FLOOR(value/100)
  FROM tbl_name
  GROUP BY id, FLOOR(value/100);

MySQL extends standard SQL to permit noncolumn expressions in GROUP BY clauses and considers the preceding statement valid.MySQL扩展了标准SQL,允许GROUP BY子句中使用非列表达式,并认为前面的语句是有效的。

Standard SQL also does not permit aliases in GROUP BY clauses. 标准SQL也不允许在GROUP BY子句中使用别名。MySQL extends standard SQL to permit aliases, so another way to write the query is as follows:MySQL扩展了标准SQL以允许别名,因此编写查询的另一种方法如下:

SELECT id, FLOOR(value/100) AS val
  FROM tbl_name
  GROUP BY id, val;

The alias val is considered a column expression in the GROUP BY clause.别名val被视为GROUP BY子句中的列表达式。

In the presence of a noncolumn expression in the GROUP BY clause, MySQL recognizes equality between that expression and expressions in the select list. 如果GROUP BY子句中存在非列表达式,MySQL会识别该表达式与SELECT列表中的表达式之间的相等性。This means that with ONLY_FULL_GROUP_BY SQL mode enabled, the query containing GROUP BY id, FLOOR(value/100) is valid because that same FLOOR() expression occurs in the select list. 这意味着,在启用了ONLY_FULL_GROUP_BY SQL模式的情况下,包含GROUP BY id, FLOOR(value/100)的查询是有效的,因为选择列表中出现了相同的FLOOR()表达式。However, MySQL does not try to recognize functional dependence on GROUP BY noncolumn expressions, so the following query is invalid with ONLY_FULL_GROUP_BY enabled, even though the third selected expression is a simple formula of the id column and the FLOOR() expression in the GROUP BY clause:但是,MySQL不会尝试识别对GROUP BY非列表达式的函数依赖性,因此以下查询在启用ONLY_FULL_GROUP_BY的情况下是无效的,即使第三个选定表达式是GROUP BY子句中id列和FLOOR()表达式的简单公式:

SELECT id, FLOOR(value/100), id+FLOOR(value/100)
  FROM tbl_name
  GROUP BY id, FLOOR(value/100);

A workaround is to use a derived table:解决方法是使用派生表:

SELECT id, F, id+F
  FROM
    (SELECT id, FLOOR(value/100) AS F
     FROM tbl_name
     GROUP BY id, FLOOR(value/100)) AS dt;