Beginning with MySQL 8.0.18, MySQL employs a hash join for any query for which each join has an equi-join condition, and in which there are no indexes that can be applied to any join conditions, such as this one:从MySQL 8.0.18开始,MySQL对任何查询都使用哈希连接,每个连接都有一个equi连接条件,并且没有可以应用于任何连接条件的索引,例如:
SELECT * FROM t1 JOIN t2 ON t1.c1=t2.c1;
A hash join can also be used when there are one or more indexes that can be used for single-table predicates.当有一个或多个索引可用于单表谓词时,也可以使用哈希联接。
A hash join is usually faster than and is intended to be used in such cases instead of the block nested loop algorithm (see Block Nested-Loop Join Algorithm) employed in previous versions of MySQL. 散列连接通常比以前版本的MySQL中使用的块嵌套循环算法(请参阅块嵌套循环连接算法)更快,并打算在这种情况下使用。Beginning with MySQL 8.0.20, support for block nested loop is removed, and the server employs a hash join wherever a block nested loop would have been used previously.从MySQL8.0.20开始,对块嵌套循环的支持被删除,服务器在以前使用块嵌套循环的地方使用哈希连接。
In the example just shown and the remaining examples in this section, we assume that the three tables 在刚刚显示的示例和本节中的其余示例中,我们假设使用以下语句创建了三个表t1
, t2
, and t3
have been created using the following statements:t1
、t2
和t3
:
CREATE TABLE t1 (c1 INT, c2 INT); CREATE TABLE t2 (c1 INT, c2 INT); CREATE TABLE t3 (c1 INT, c2 INT);
You can see that a hash join is being employed by using 您可以看到,使用EXPLAIN
, like this:EXPLAIN
使用了哈希联接,如下所示:
mysql>EXPLAIN
->SELECT * FROM t1
->JOIN t2 ON t1.c1=t2.c1\G
*************************** 1. row *************************** id: 1 select_type: SIMPLE table: t1 partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 1 filtered: 100.00 Extra: NULL *************************** 2. row *************************** id: 1 select_type: SIMPLE table: t2 partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 1 filtered: 100.00 Extra: Using where; Using join buffer (hash join)
(Prior to MySQL 8.0.20, it was necessary to include the (在MySQL8.0.20之前,必须包含FORMAT=TREE
option to see whether hash joins were being used for a given join.)FORMAT=TREE
选项,以查看哈希联接是否用于给定联接。)
EXPLAIN ANALYZE
also displays information about hash joins used.EXPLAIN ANALYZE
还显示有关使用的哈希联接的信息。
The hash join is used for queries involving multiple joins as well, as long as at least one join condition for each pair of tables is an equi-join, like the query shown here:哈希联接也用于涉及多个联接的查询,只要每对表至少有一个联接条件是equi联接,如下所示:
SELECT * FROM t1 JOIN t2 ON (t1.c1 = t2.c1 AND t1.c2 < t2.c2) JOIN t3 ON (t2.c1 = t3.c1);
In cases like the one just shown, which makes use of an inner join, any extra conditions which are not equi-joins are applied as filters after the join is executed. 在像刚才所示的使用内部联接的情况下,在执行联接之后,任何不等于联接的额外条件都会作为过滤器应用。(For outer joins, such as left joins, semijoins, and antijoins, they are printed as part of the join.) (对于外部联接,例如左联接、半联接和反联接,它们将作为联接的一部分打印。)This can be seen here in the output of 这可以在EXPLAIN
:EXPLAIN
的输出中看到:
mysql>EXPLAIN FORMAT=TREE
->SELECT *
->FROM t1
->JOIN t2
->ON (t1.c1 = t2.c1 AND t1.c2 < t2.c2)
->JOIN t3
->ON (t2.c1 = t3.c1)\G
*************************** 1. row *************************** EXPLAIN: -> Inner hash join (t3.c1 = t1.c1) (cost=1.05 rows=1) -> Table scan on t3 (cost=0.35 rows=1) -> Hash -> Filter: (t1.c2 < t2.c2) (cost=0.70 rows=1) -> Inner hash join (t2.c1 = t1.c1) (cost=0.70 rows=1) -> Table scan on t2 (cost=0.35 rows=1) -> Hash -> Table scan on t1 (cost=0.35 rows=1)
As also can be seen from the output just shown, multiple hash joins can be (and are) used for joins having multiple equi-join conditions.从刚才显示的输出还可以看出,多个散列联接可以(并且是)用于具有多个相等联接条件的联接。
Prior to MySQL 8.0.20, a hash join could not be used if any pair of joined tables did not have at least one equi-join condition, and the slower block nested loop algorithm was employed. 在MySQL 8.0.20之前,如果任何一对连接的表没有至少一个equi-join条件,则不能使用hash-join,并且使用较慢的块嵌套循环算法。In MySQL 8.0.20 and later, the hash join is used in such cases, as shown here:在MySQL 8.0.20及更高版本中,在这种情况下使用哈希连接,如下所示:
mysql>EXPLAIN FORMAT=TREE
->SELECT * FROM t1
->JOIN t2 ON (t1.c1 = t2.c1)
->JOIN t3 ON (t2.c1 < t3.c1)\G
*************************** 1. row *************************** EXPLAIN: -> Filter: (t1.c1 < t3.c1) (cost=1.05 rows=1) -> Inner hash join (no condition) (cost=1.05 rows=1) -> Table scan on t3 (cost=0.35 rows=1) -> Hash -> Inner hash join (t2.c1 = t1.c1) (cost=0.70 rows=1) -> Table scan on t2 (cost=0.35 rows=1) -> Hash -> Table scan on t1 (cost=0.35 rows=1)
(Additional examples are provided later in this section.)(本节后面将提供其他示例。)
A hash join is also applied for a Cartesian product—that is, when no join condition is specified, as shown here:哈希联接也适用于笛卡尔积,即在未指定联接条件时,如下所示:
mysql>EXPLAIN FORMAT=TREE
->SELECT *
->FROM t1
->JOIN t2
->WHERE t1.c2 > 50\G
*************************** 1. row *************************** EXPLAIN: -> Inner hash join (cost=0.70 rows=1) -> Table scan on t2 (cost=0.35 rows=1) -> Hash -> Filter: (t1.c2 > 50) (cost=0.35 rows=1) -> Table scan on t1 (cost=0.35 rows=1)
In MySQL 8.0.20 and later, it is no longer necessary for the join to contain at least one equi-join condition in order for a hash join to be used. 在MySQL 8.0.20及更高版本中,为了使用hash连接,连接不再需要包含至少一个equi-join条件。This means that the types of queries which can be optimized using hash joins include those in the following list (with examples):这意味着可以使用哈希联接进行优化的查询类型包括以下列表中的查询(带示例):
Inner non-equi-join:
mysql> EXPLAIN FORMAT=TREE SELECT * FROM t1 JOIN t2 ON t1.c1 < t2.c1\G
*************************** 1. row ***************************
EXPLAIN: -> Filter: (t1.c1 < t2.c1) (cost=4.70 rows=12)
-> Inner hash join (no condition) (cost=4.70 rows=12)
-> Table scan on t2 (cost=0.08 rows=6)
-> Hash
-> Table scan on t1 (cost=0.85 rows=6)
Semijoin:
mysql>EXPLAIN FORMAT=TREE SELECT * FROM t1
->WHERE t1.c1 IN (SELECT t2.c2 FROM t2)\G
*************************** 1. row *************************** EXPLAIN: -> Nested loop inner join -> Filter: (t1.c1 is not null) (cost=0.85 rows=6) -> Table scan on t1 (cost=0.85 rows=6) -> Single-row index lookup on <subquery2> using <auto_distinct_key> (c2=t1.c1) -> Materialize with deduplication -> Filter: (t2.c2 is not null) (cost=0.85 rows=6) -> Table scan on t2 (cost=0.85 rows=6)
Antijoin:
mysql>EXPLAIN FORMAT=TREE SELECT * FROM t2
->WHERE NOT EXISTS (SELECT * FROM t1 WHERE t1.col1 = t2.col1)\G
*************************** 1. row *************************** EXPLAIN: -> Nested loop antijoin -> Table scan on t2 (cost=0.85 rows=6) -> Single-row index lookup on <subquery2> using <auto_distinct_key> (c1=t2.c1) -> Materialize with deduplication -> Filter: (t1.c1 is not null) (cost=0.85 rows=6) -> Table scan on t1 (cost=0.85 rows=6)
Left outer join:
mysql> EXPLAIN FORMAT=TREE SELECT * FROM t1 LEFT JOIN t2 ON t1.c1 = t2.c1\G
*************************** 1. row ***************************
EXPLAIN: -> Left hash join (t2.c1 = t1.c1) (cost=3.99 rows=36)
-> Table scan on t1 (cost=0.85 rows=6)
-> Hash
-> Table scan on t2 (cost=0.14 rows=6)
Right outer join (observe that MySQL rewrites all right outer joins as left outer joins)(注意MySQL将所有右外连接重写为左外连接):
mysql> EXPLAIN FORMAT=TREE SELECT * FROM t1 RIGHT JOIN t2 ON t1.c1 = t2.c1\G
*************************** 1. row ***************************
EXPLAIN: -> Left hash join (t1.c1 = t2.c1) (cost=3.99 rows=36)
-> Table scan on t2 (cost=0.85 rows=6)
-> Hash
-> Table scan on t1 (cost=0.14 rows=6)
By default, MySQL 8.0.18 and later employs hash joins whenever possible. 默认情况下,MySQL 8.0.18及更高版本尽可能使用哈希连接。It is possible to control whether hash joins are employed using one of the 可以使用BNL
and NO_BNL
optimizer hints.BNL
和NO-BNL
优化器提示之一控制是否使用散列联接。
(MySQL 8.0.18 supported (MySQL 8.0.18支持hash_join=on
or hash_join=off
as part of the setting for the optimizer_switch
server system variable as well as the optimizer hints HASH_JOIN
or NO_HASH_JOIN
. hash_join=on
或hash_join=off
作为optimizer_switch
服务器系统变量设置的一部分,以及优化器提示HASH_JOIN
或NO_HASH_JOIN
。In MySQL 8.0.19 and later, these no longer have any effect.)在MySQL 8.0.19及更高版本中,这些不再有任何效果。)
Memory usage by hash joins can be controlled using the 哈希连接的内存使用可以使用join_buffer_size
system variable; a hash join cannot use more memory than this amount. join_buffer_size
系统变量来控制;哈希联接使用的内存不能超过此数量。When the memory required for a hash join exceeds the amount available, MySQL handles this by using files on disk. 当散列连接所需的内存超过可用内存时,MySQL使用磁盘上的文件来处理这个问题。If this happens, you should be aware that the join may not succeed if a hash join cannot fit into memory and it creates more files than set for 如果发生这种情况,您应该意识到,如果哈希连接无法放入内存,并且它创建的文件数超过为open_files_limit
. open_files_limit
设置的数量,那么连接可能不会成功。To avoid such problems, make either of the following changes:要避免此类问题,请进行以下更改之一:
Increase 增加join_buffer_size
so that the hash join does not spill over to disk.join_buffer_size
,以便哈希连接不会溢出到磁盘。
Increase 增加open_files_limit
.open_files_limit
。
Beginning with MySQL 8.0.18, join buffers for hash joins are allocated incrementally; thus, you can set 从MySQL8.0.18开始,哈希连接的连接缓冲区是增量分配的;因此,您可以将join_buffer_size
higher without small queries allocating very large amounts of RAM, but outer joins allocate the entire buffer. join_buffer_size
设置得更高,而不需要小查询分配大量的RAM,但是外部连接分配整个缓冲区。In MySQL 8.0.20 and later, hash joins are used for outer joins (including antijoins and semijoins) as well, so this is no longer an issue.在MySQL 8.0.20及更高版本中,哈希连接也用于外部连接(包括反连接和半连接),因此这不再是一个问题。