8.2.1.17 GROUP BY Optimization优化

The most general way to satisfy a GROUP BY clause is to scan the whole table and create a new temporary table where all rows from each group are consecutive, and then use this temporary table to discover groups and apply aggregate functions (if any). 满足GROUPBY子句的最常用方法是扫描整个表并创建一个新的临时表,其中每个组中的所有行都是连续的,然后使用此临时表来发现组并应用聚合函数(如果有)。In some cases, MySQL is able to do much better than that and avoid creation of temporary tables by using index access.在某些情况下,使用MySQL来避免创建表要比使用临时表好得多。

The most important preconditions for using indexes for GROUP BY are that all GROUP BY columns reference attributes from the same index, and that the index stores its keys in order (as is true, for example, for a BTREE index, but not for a HASH index). GROUP BY使用索引的最重要的先决条件是,所有GROUP BY列都引用同一索引中的属性,并且索引按顺序存储其键(例如,对于BTREE索引是如此,但对于哈希索引则不是如此)。Whether use of temporary tables can be replaced by index access also depends on which parts of an index are used in a query, the conditions specified for these parts, and the selected aggregate functions.临时表的使用是否可以替换为索引访问还取决于查询中使用了索引的哪些部分、为这些部分指定的条件以及所选的聚合函数。

There are two ways to execute a GROUP BY query through index access, as detailed in the following sections. 有两种方法可以通过索引访问执行GROUP BY查询,详见以下各节。The first method applies the grouping operation together with all range predicates (if any). 第一种方法将分组操作与所有范围谓词(如果有)一起应用。The second method first performs a range scan, and then groups the resulting tuples.第二种方法首先执行范围扫描,然后对生成的元组进行分组。

Loose Index Scan can also be used in the absence of GROUP BY under some conditions. 在某些情况下,如果没有GROUP BY,也可以使用松散索引扫描。See Skip Scan Range Access Method.请参阅“跳过扫描范围访问方法”

Loose Index Scan松散索引扫描

The most efficient way to process GROUP BY is when an index is used to directly retrieve the grouping columns. 处理GROUP BY的最有效方法是使用索引直接检索分组列。With this access method, MySQL uses the property of some index types that the keys are ordered (for example, BTREE). 通过这种访问方法,MySQL使用一些索引类型的属性,这些索引类型的键是按顺序排列的(例如,BTREE)。This property enables use of lookup groups in an index without having to consider all keys in the index that satisfy all WHERE conditions. 此属性允许在索引中使用查找组,而不必考虑索引中满足所有WHERE条件的所有键。This access method considers only a fraction of the keys in an index, so it is called a Loose Index Scan. 这种访问方法只考虑索引中的一小部分键,因此称为“松散索引扫描”。When there is no WHERE clause, a Loose Index Scan reads as many keys as the number of groups, which may be a much smaller number than that of all keys. 当没有WHERE子句时,松散索引扫描读取的键数与组数相同,这可能比所有键的数目小得多。If the WHERE clause contains range predicates (see the discussion of the range join type in Section 8.8.1, “Optimizing Queries with EXPLAIN”), a Loose Index Scan looks up the first key of each group that satisfies the range conditions, and again reads the smallest possible number of keys. 如果WHERE子句包含范围谓词(请参阅第8.8.1节,“使用EXPLAIN优化查询”中对range联接类型的讨论),松散索引扫描将查找满足范围条件的每个组的第一个键,并再次读取尽可能少的键。This is possible under the following conditions:这在以下条件下是可能的:

  • The query is over a single table.查询是在单个表上进行的。

  • The GROUP BY names only columns that form a leftmost prefix of the index and no other columns. GROUP BY只命名构成索引最左边前缀的列,不命名其他列。(If, instead of GROUP BY, the query has a DISTINCT clause, all distinct attributes refer to columns that form a leftmost prefix of the index.) (如果查询具有DISTINCT子句而不是GROUP BY,则所有DISTINCT属性都引用构成索引最左侧前缀的列。)For example, if a table t1 has an index on (c1,c2,c3), Loose Index Scan is applicable if the query has GROUP BY c1, c2. 例如,如果表t1(c1,c2,c3)上有索引,则如果查询具有GROUP BY c1、c2,则适用松散索引扫描。It is not applicable if the query has GROUP BY c2, c3 (the columns are not a leftmost prefix) or GROUP BY c1, c2, c4 (c4 is not in the index).如果查询具有GROUP BY c2, c3(列不是最左边的前缀)或GROUP BY c1, c2, c4c4不在索引中),则此选项不适用。

  • The only aggregate functions used in the select list (if any) are MIN() and MAX(), and all of them refer to the same column. 选择列表中使用的唯一聚合函数(如果有)是MIN()MAX(),它们都引用同一列。The column must be in the index and must immediately follow the columns in the GROUP BY.该列必须位于索引中,并且必须紧跟在GROUP BY中的列之后。

  • Any other parts of the index than those from the GROUP BY referenced in the query must be constants (that is, they must be referenced in equalities with constants), except for the argument of MIN() or MAX() functions.除了查询中引用的GROUP BY之外,索引的任何其他部分都必须是常量(也就是说,它们必须与常量相等地引用),但MIN()MAX()函数的参数除外。

  • For columns in the index, full column values must be indexed, not just a prefix. 对于索引中的列,必须为完整列值编制索引,而不仅仅是前缀。For example, with c1 VARCHAR(20), INDEX (c1(10)), the index uses only a prefix of c1 values and cannot be used for Loose Index Scan.例如,对于c1 VARCHAR(20), INDEX (c1(10)),索引仅使用c1值的前缀,不能用于松散索引扫描。

If Loose Index Scan is applicable to a query, the EXPLAIN output shows Using index for group-by in the Extra column.如果松散索引扫描适用于查询,则EXPLAIN输出会在Extra列中显示Using index for group-by

Assume that there is an index idx(c1,c2,c3) on table t1(c1,c2,c3,c4). 假设表t1(c1,c2,c3,c4)上有一个索引idx(c1,c2,c3)The Loose Index Scan access method can be used for the following queries:松散索引扫描访问方法可用于以下查询:

SELECT c1, c2 FROM t1 GROUP BY c1, c2;
SELECT DISTINCT c1, c2 FROM t1;
SELECT c1, MIN(c2) FROM t1 GROUP BY c1;
SELECT c1, c2 FROM t1 WHERE c1 < const GROUP BY c1, c2;
SELECT MAX(c3), MIN(c3), c1, c2 FROM t1 WHERE c2 > const GROUP BY c1, c2;
SELECT c2 FROM t1 WHERE c1 < const GROUP BY c1, c2;
SELECT c1, c2 FROM t1 WHERE c3 = const GROUP BY c1, c2;

The following queries cannot be executed with this quick select method, for the reasons given:由于以下原因,无法使用此快速选择方法执行以下查询:

  • There are aggregate functions other than MIN() or MAX():存在除MIN()MAX()以外的聚合函数:

    SELECT c1, SUM(c2) FROM t1 GROUP BY c1;
  • The columns in the GROUP BY clause do not form a leftmost prefix of the index:GROUP BY子句中的列不构成索引的最左侧前缀:

    SELECT c1, c2 FROM t1 GROUP BY c2, c3;
  • The query refers to a part of a key that comes after the GROUP BY part, and for which there is no equality with a constant:查询是指在GROUP BY部分之后的键的一部分,该部分与常量不相等:

    SELECT c1, c3 FROM t1 GROUP BY c1, c2;

    Were the query to include WHERE c3 = const, Loose Index Scan could be used.如果查询包含WHERE c3 = const,则可以使用松散索引扫描。

The Loose Index Scan access method can be applied to other forms of aggregate function references in the select list, in addition to the MIN() and MAX() references already supported:除已支持的MIN()MAX()引用外,松散索引扫描访问方法还可应用于选择列表中其他形式的聚合函数引用:

  • AVG(DISTINCT), SUM(DISTINCT), and COUNT(DISTINCT) are supported. 支持AVG(DISTINCT)SUM(DISTINCT)COUNT(DISTINCT)AVG(DISTINCT) and SUM(DISTINCT) take a single argument. AVG(DISTINCT)SUM(DISTINCT)采用单个参数。COUNT(DISTINCT) can have more than one column argument.COUNT(DISTINCT)可以有多个列参数。

  • There must be no GROUP BY or DISTINCT clause in the query.查询中不能有GROUP BYDISTINCT子句。

  • The Loose Index Scan limitations described previously still apply.前面描述的松散索引扫描限制仍然适用。

Assume that there is an index idx(c1,c2,c3) on table t1(c1,c2,c3,c4). 假设表t1(c1,c2,c3,c4)上有一个索引idx(c1,c2,c3)The Loose Index Scan access method can be used for the following queries:松散索引扫描访问方法可用于以下查询:

SELECT COUNT(DISTINCT c1), SUM(DISTINCT c1) FROM t1;

SELECT COUNT(DISTINCT c1, c2), COUNT(DISTINCT c2, c1) FROM t1;
Tight Index Scan紧密索引扫描

A Tight Index Scan may be either a full index scan or a range index scan, depending on the query conditions.紧密索引扫描可以是完全索引扫描,也可以是范围索引扫描,具体取决于查询条件。

When the conditions for a Loose Index Scan are not met, it still may be possible to avoid creation of temporary tables for GROUP BY queries. 当不满足松散索引扫描的条件时,仍然可以避免为GROUP BY查询创建临时表。If there are range conditions in the WHERE clause, this method reads only the keys that satisfy these conditions. 如果WHERE子句中存在范围条件,则此方法仅读取满足这些条件的键。Otherwise, it performs an index scan. 否则,它将执行索引扫描。Because this method reads all keys in each range defined by the WHERE clause, or scans the whole index if there are no range conditions, it is called a Tight Index Scan. 由于此方法读取WHERE子句定义的每个范围内的所有键,或者如果没有范围条件,则扫描整个索引,因此称为“紧密索引扫描”。With a Tight Index Scan, the grouping operation is performed only after all keys that satisfy the range conditions have been found.对于紧密索引扫描,只有在找到满足范围条件的所有键之后,才能执行分组操作。

For this method to work, it is sufficient that there be a constant equality condition for all columns in a query referring to parts of the key coming before or in between parts of the GROUP BY key. 要使此方法起作用,查询中的所有列都有一个恒定的相等条件就足够了,该查询引用的键部分位于GROUP BY键部分之前或之间。The constants from the equality conditions fill in any gaps in the search keys so that it is possible to form complete prefixes of the index. 相等条件中的常量将填充搜索键中的任何“空白”,以便可以形成索引的完整前缀。These index prefixes then can be used for index lookups. 然后,这些索引前缀可用于索引查找。If the GROUP BY result requires sorting, and it is possible to form search keys that are prefixes of the index, MySQL also avoids extra sorting operations because searching with prefixes in an ordered index already retrieves all the keys in order.如果GROUP BY结果需要排序,并且可以形成作为索引前缀的搜索键,MySQL还可以避免额外的排序操作,因为在有序索引中使用前缀进行搜索已经按顺序检索所有键。

Assume that there is an index idx(c1,c2,c3) on table t1(c1,c2,c3,c4). 假设表t1(c1,c2,c3,c4)上有一个索引idx(c1,c2,c3)The following queries do not work with the Loose Index Scan access method described previously, but still work with the Tight Index Scan access method.以下查询不适用于前面描述的松散索引扫描访问方法,但仍适用于紧密索引扫描访问方法。

  • There is a gap in the GROUP BY, but it is covered by the condition c2 = 'a':GROUP BY中有一个间隙,但它被条件c2 = 'a'覆盖:

    SELECT c1, c2, c3 FROM t1 WHERE c2 = 'a' GROUP BY c1, c3;
  • The GROUP BY does not begin with the first part of the key, but there is a condition that provides a constant for that part:GROUP BY不以键的第一部分开始,但有一个条件为该部分提供常量:

    SELECT c1, c2, c3 FROM t1 WHERE c1 = 'a' GROUP BY c2, c3;