In MySQL, a Batched Key Access (BKA) Join algorithm is available that uses both index access to the joined table and a join buffer. 在MySQL中,可以使用批处理键访问(BKA)联接算法,该算法使用对联接表的索引访问和联接缓冲区。The BKA algorithm supports inner join, outer join, and semijoin operations, including nested outer joins. BKA算法支持内部联接、外部联接和半联接操作,包括嵌套的外部联接。Benefits of BKA include improved join performance due to more efficient table scanning. BKA的好处包括由于更高效的表扫描而提高了连接性能。Also, the Block Nested-Loop (BNL) Join algorithm previously used only for inner joins is extended and can be employed for outer join and semijoin operations, including nested outer joins.此外,以前仅用于内部联接的块嵌套循环(BNL)联接算法已扩展,可用于外部联接和半联接操作,包括嵌套外部联接。
The following sections discuss the join buffer management that underlies the extension of the original BNL algorithm, the extended BNL algorithm, and the BKA algorithm. 以下各节将讨论连接缓冲区管理,它是原始BNL算法、扩展BNL算法和BKA算法扩展的基础。For information about semijoin strategies, see Section 8.2.2.1, “Optimizing IN and EXISTS Subquery Predicates with Semijoin Transformations”有关半连接策略的信息,请参阅第8.2.2.1节,“使用半连接转换优化IN和EXISTS子查询谓词”。
MySQL can employ join buffers to execute not only inner joins without index access to the inner table, but also outer joins and semijoins that appear after subquery flattening. MySQL不仅可以使用连接缓冲区来执行内部连接,而无需对内部表进行索引访问,还可以执行子查询平坦化后出现的外部连接和半连接。Moreover, a join buffer can be effectively used when there is an index access to the inner table.此外,当存在对内部表的索引访问时,可以有效地使用联接缓冲区。
The join buffer management code slightly more efficiently utilizes join buffer space when storing the values of the interesting row columns: No additional bytes are allocated in buffers for a row column if its value is join buffer management代码在存储感兴趣的行列的值时稍微更有效地利用了联接缓冲空间:如果行列的值为NULL
, and the minimum number of bytes is allocated for any value of the VARCHAR
type.NULL
,则不会在缓冲区中为其分配额外的字节,并且为VARCHAR
类型的任何值分配最小的字节数。
The code supports two types of buffers, regular and incremental. 代码支持两种类型的缓冲区:常规缓冲区和增量缓冲区。Suppose that join buffer 假设使用联接缓冲区B1
is employed to join tables t1
and t2
and the result of this operation is joined with table t3
using join buffer B2
:B1
联接表t1
和t2
,并且使用联接缓冲区B2
将此操作的结果与表t3
联接:
A regular join buffer contains columns from each join operand. 常规联接缓冲区包含来自每个联接操作数的列。If 如果B2
is a regular join buffer, each row r
put into B2
is composed of the columns of a row r1
from B1
and the interesting columns of a matching row r2
from table t3
.B2
是常规联接缓冲区,则放入B2
的每一行r
由B1
中的行r1
的列和表t3
中匹配行r2
的感兴趣列组成。
An incremental join buffer contains only columns from rows of the table produced by the second join operand. 增量联接缓冲区仅包含第二个联接操作数生成的表行中的列。That is, it is incremental to a row from the first operand buffer. 也就是说,它从第一个操作数缓冲区递增到一行。If 如果B2
is an incremental join buffer, it contains the interesting columns of the row r2
together with a link to the row r1
from B1
.B2
是增量联接缓冲区,则它包含行r2
的感兴趣的列以及从B1
到行r1
的链接。
Incremental join buffers are always incremental relative to a join buffer from an earlier join operation, so the buffer from the first join operation is always a regular buffer. 增量连接缓冲区始终是相对于先前连接操作的连接缓冲区的增量,因此第一个连接操作的缓冲区始终是常规缓冲区。In the example just given, the buffer 在刚刚给出的示例中,用于连接表B1
used to join tables t1
and t2
must be a regular buffer.t1
和t2
的缓冲区B1
必须是常规缓冲区。
Each row of the incremental buffer used for a join operation contains only the interesting columns of a row from the table to be joined. 用于联接操作的增量缓冲区的每一行只包含要联接的表中某一行的感兴趣列。These columns are augmented with a reference to the interesting columns of the matched row from the table produced by the first join operand. 这些列通过对第一个联接操作数生成的表中匹配行的感兴趣列的引用进行扩充。Several rows in the incremental buffer can refer to the same row 增量缓冲区中的几行可以引用同一行r
whose columns are stored in the previous join buffers insofar as all these rows match row r
.r
,只要这些行与行r
匹配,这些行的列就存储在以前的联接缓冲区中。
Incremental buffers enable less frequent copying of columns from buffers used for previous join operations. 增量缓冲区可以减少从先前联接操作所用缓冲区复制列的频率。This provides a savings in buffer space because in the general case a row produced by the first join operand can be matched by several rows produced by the second join operand. 这节省了缓冲区空间,因为在一般情况下,第一个联接操作数生成的行可以与第二个联接操作数生成的多行匹配。It is unnecessary to make several copies of a row from the first operand. Incremental buffers also provide a savings in processing time due to the reduction in copying time.不需要从第一个操作数复制一行的多个副本。由于复制时间的减少,增量缓冲区还可以节省处理时间。
In MySQL 8.0, the 在MySQL 8.0中,block_nested_loop
flag of the optimizer_switch
system variable works as follows:optimizer_switch
系统变量的block_nested_loop
标志的工作原理如下:
Prior to MySQL 8.0.20, it controls how the optimizer uses the Block Nested Loop join algorithm.在MySQL 8.0.20之前,它控制优化器如何使用块嵌套循环联接算法。
In MySQL 8.0.18 and later, it also controls the use of hash joins (see Section 8.2.1.4, “Hash Join Optimization”).在MySQL 8.0.18及更高版本中,它还控制哈希联接的使用(请参阅第8.2.1.4节,“哈希联接优化”)。
Beginning with MySQL 8.0.20, the flag controls hash joins only, and the block nested loop algorithm is no longer supported.从MySQL 8.0.20开始,该标志仅控制哈希连接,不再支持块嵌套循环算法。
The batched_key_access
flag controls how the optimizer uses the Batched Key Access join algorithms.batched_key_access
标志控制优化器如何使用批处理密钥访问连接算法。
By default, 默认情况下,block_nested_loop
is on
and batched_key_access
is off
. block_nested_loop
处于on
状态,batched_key_access
处于off
状态。See Section 8.9.2, “Switchable Optimizations”. 请参阅第8.9.2节,“可切换优化”。Optimizer hints may also be applied; see Optimizer Hints for Block Nested-Loop and Batched Key Access Algorithms.也可以应用优化器提示;请参阅有关块嵌套循环和批处理密钥访问算法的优化器提示。
For information about semijoin strategies, see Section 8.2.2.1, “Optimizing IN and EXISTS Subquery Predicates with Semijoin Transformations”有关半连接策略的信息,请参阅第8.2.2.1节,“使用半连接转换优化IN和EXISTS子查询谓词”。
The original implementation of the MySQL BNL algorithm was extended to support outer join and semijoin operations (and was later superseded by the hash join algorithm; see Section 8.2.1.4, “Hash Join Optimization”).MySQL BNL算法的原始实现被扩展以支持外部连接和半连接操作(后来被哈希连接算法取代;请参阅第8.2.1.4节,“哈希连接优化”)。
When these operations are executed with a join buffer, each row put into the buffer is supplied with a match flag.当使用联接缓冲区执行这些操作时,放入缓冲区的每一行都会提供一个匹配标志。
If an outer join operation is executed using a join buffer, each row of the table produced by the second operand is checked for a match against each row in the join buffer. 如果使用联接缓冲区执行外部联接操作,则将检查第二个操作数生成的表的每一行是否与联接缓冲区中的每一行匹配。When a match is found, a new extended row is formed (the original row plus columns from the second operand) and sent for further extensions by the remaining join operations. 找到匹配项后,将形成一个新的扩展行(原始行加上第二个操作数中的列),并通过剩余的联接操作发送以进行进一步扩展。In addition, the match flag of the matched row in the buffer is enabled. 此外,将启用缓冲区中匹配行的匹配标志。After all rows of the table to be joined have been examined, the join buffer is scanned. 检查要联接的表的所有行之后,将扫描联接缓冲区。Each row from the buffer that does not have its match flag enabled is extended by 缓冲区中未启用匹配标志的每一行都通过NULL
complements (NULL
values for each column in the second operand) and sent for further extensions by the remaining join operations.NULL
补码(第二个操作数中每一列的NULL
值)进行扩展,并通过其余的联接操作进行进一步扩展。
In MySQL 8.0, the 在MySQL 8.0中,block_nested_loop
flag of the optimizer_switch
system variable works as follows:optimizer_switch
系统变量的block_nested_loop
标志的工作原理如下:
Prior to MySQL 8.0.20, it controls how the optimizer uses the Block Nested Loop join algorithm.在MySQL 8.0.20之前,它控制优化器如何使用块嵌套循环联接算法。
In MySQL 8.0.18 and later, it also controls the use of hash joins (see Section 8.2.1.4, “Hash Join Optimization”).在MySQL 8.0.18及更高版本中,它还控制哈希联接的使用(请参阅第8.2.1.4节,“哈希联接优化”)。
Beginning with MySQL 8.0.20, the flag controls hash joins only, and the block nested loop algorithm is no longer supported.从MySQL 8.0.20开始,该标志仅控制哈希连接,不再支持块嵌套循环算法。
See Section 8.9.2, “Switchable Optimizations”, for more information. 有关更多信息,请参阅第8.9.2节,“可切换的优化”。Optimizer hints may also be applied; see Optimizer Hints for Block Nested-Loop and Batched Key Access Algorithms.也可以应用优化器提示;请参阅有关块嵌套循环和批处理密钥访问算法的优化器提示。
In 在EXPLAIN
output, use of BNL for a table is signified when the Extra
value contains Using join buffer (Block Nested Loop)
and the type
value is ALL
, index
, or range
.EXPLAIN
输出中,当Extra
值包含Using join buffer (Block Nested Loop)
且type
值为ALL
、index
或range
时,表示对表使用BNL。
For information about semijoin strategies, see Section 8.2.2.1, “Optimizing IN and EXISTS Subquery Predicates with Semijoin Transformations”有关半连接策略的信息,请参阅第8.2.2.1节,“使用半连接转换优化IN和EXISTS子查询谓词”。
MySQL implements a method of joining tables called the Batched Key Access (BKA) join algorithm. MySQL实现了一种连接表的方法,称为批处理密钥访问(BKA)连接算法。BKA can be applied when there is an index access to the table produced by the second join operand. 当对第二个联接操作数生成的表进行索引访问时,可以应用BKA。Like the BNL join algorithm, the BKA join algorithm employs a join buffer to accumulate the interesting columns of the rows produced by the first operand of the join operation. 与BNL join算法一样,BKA join算法使用联接缓冲区来累积联接操作的第一个操作数生成的行的感兴趣列。Then the BKA algorithm builds keys to access the table to be joined for all rows in the buffer and submits these keys in a batch to the database engine for index lookups. 然后,BKA算法构建键来访问缓冲区中所有行要联接的表,并将这些键批量提交给数据库引擎进行索引查找。The keys are submitted to the engine through the Multi-Range Read (MRR) interface (see Section 8.2.1.11, “Multi-Range Read Optimization”). 钥匙通过多量程读取(MRR)接口提交给发动机(请参阅第8.2.1.11节,“多量程读取优化”)。After submission of the keys, the MRR engine functions perform lookups in the index in an optimal way, fetching the rows of the joined table found by these keys, and starts feeding the BKA join algorithm with matching rows. 提交键后,MRR引擎函数以最佳方式在索引中执行查找,获取由这些键找到的联接表的行,并开始向BKA联接算法提供匹配行。Each matching row is coupled with a reference to a row in the join buffer.每个匹配行与连接缓冲区中的行的引用耦合。
When BKA is used, the value of 使用BKA时,join_buffer_size
defines how large the batch of keys is in each request to the storage engine. join_buffer_size
的值定义对存储引擎的每个请求中密钥批的大小。The larger the buffer, the more sequential access is made to the right hand table of a join operation, which can significantly improve performance.缓冲区越大,对联接操作右侧表的顺序访问就越多,这可以显著提高性能。
For BKA to be used, the 要使用BKA,必须将batched_key_access
flag of the optimizer_switch
system variable must be set to on
. optimizer_switch
系统变量的batched_key_access
标志设置为on
。BKA uses MRR, so the BKA使用MRR,因此mrr
flag must also be on
. mrr
标志也必须打开。Currently, the cost estimation for MRR is too pessimistic. 目前,MRR的估算也很悲观。Hence, it is also necessary for 因此,为了使用BKA,还必须关闭mrr_cost_based
to be off
for BKA to be used. mrr_cost_based
的功能。The following setting enables BKA:以下设置启用BKA:
mysql> SET optimizer_switch='mrr=on,mrr_cost_based=off,batched_key_access=on';
There are two scenarios by which MRR functions execute:MRR功能的执行有两种情况:
The first scenario is used for conventional disk-based storage engines such as 第一种场景用于传统的基于磁盘的存储引擎,如InnoDB
and MyISAM
. InnoDB
和MyISAM
。For these engines, usually the keys for all rows from the join buffer are submitted to the MRR interface at once. 对于这些引擎,通常连接缓冲区中所有行的键都会立即提交到MRR接口。Engine-specific MRR functions perform index lookups for the submitted keys, get row IDs (or primary keys) from them, and then fetch rows for all these selected row IDs one by one by request from BKA algorithm. 特定于引擎的MRR函数对提交的键执行索引查找,从中获取行ID(或主键),然后根据BKA算法的请求逐个获取所有这些选定行ID的行。Every row is returned with an association reference that enables access to the matched row in the join buffer. 每行返回一个关联引用,该引用允许访问联接缓冲区中的匹配行。The rows are fetched by the MRR functions in an optimal way: They are fetched in the row ID (primary key) order. MRR函数以最佳方式获取行:它们以行ID(主键)顺序获取。This improves performance because reads are in disk order rather than random order.这提高了性能,因为读取是按磁盘顺序而不是随机顺序进行的。
The second scenario is used for remote storage engines such as 第二种场景用于NDB
. NDB
等远程存储引擎。A package of keys for a portion of rows from the join buffer, together with their associations, is sent by a MySQL Server (SQL node) to MySQL Cluster data nodes. MySQL服务器(SQL节点)将连接缓冲区中部分行的密钥包及其关联发送到MySQL集群数据节点。In return, the SQL node receives a package (or several packages) of matching rows coupled with corresponding associations. 作为回报,SQL节点接收一个(或多个)匹配行的包以及相应的关联。The BKA join algorithm takes these rows and builds new joined rows. BKA连接算法获取这些行并构建新的连接行。Then a new set of keys is sent to the data nodes and the rows from the returned packages are used to build new joined rows. 然后将一组新的键发送到数据节点,并使用返回包中的行构建新的连接行。The process continues until the last keys from the join buffer are sent to the data nodes, and the SQL node has received and joined all rows matching these keys. 该过程将继续,直到连接缓冲区中的最后一个键被发送到数据节点,并且SQL节点已接收并连接与这些键匹配的所有行。This improves performance because fewer key-bearing packages sent by the SQL node to the data nodes means fewer round trips between it and the data nodes to perform the join operation.这提高了性能,因为SQL节点向数据节点发送的密钥承载包更少,这意味着它与数据节点之间执行联接操作的往返次数更少。
With the first scenario, a portion of the join buffer is reserved to store row IDs (primary keys) selected by index lookups and passed as a parameter to the MRR functions.在第一种情况下,连接缓冲区的一部分被保留用于存储由索引查找选择的行ID(主键),并作为参数传递给MRR函数。
There is no special buffer to store keys built for rows from the join buffer. 没有特殊的缓冲区来存储为join缓冲区中的行生成的键。Instead, a function that builds the key for the next row in the buffer is passed as a parameter to the MRR functions.相反,为缓冲区中的下一行生成键的函数作为参数传递给MRR函数。
In 在EXPLAIN
output, use of BKA for a table is signified when the Extra
value contains Using join buffer (Batched Key Access)
and the type
value is ref
or eq_ref
.EXPLAIN
输出中,当Extra
值包含Using join buffer (Batched Key Access)
且type
值为ref
或eq_ref
时,表示对表使用BKA。
In addition to using the 除了使用optimizer_switch
system variable to control optimizer use of the BNL and BKA algorithms session-wide, MySQL supports optimizer hints to influence the optimizer on a per-statement basis. optimizer_switch
系统变量来控制整个会话范围内使用BNL和BKA算法的优化器外,MySQL还支持优化器提示,以每语句为基础影响优化器。See Section 8.9.3, “Optimizer Hints”.请参阅第8.9.3节,“优化器提示”。
To use a BNL or BKA hint to enable join buffering for any inner table of an outer join, join buffering must be enabled for all inner tables of the outer join.要使用BNL或BKA提示为外部联接的任何内部表启用联接缓冲,必须为外部联接的所有内部表启用联接缓冲。