MySQL executes joins between tables using a nested-loop algorithm or variations on it.MySQL使用嵌套循环算法或其变体在表之间执行联接。
A simple nested-loop join (NLJ) algorithm reads rows from the first table in a loop one at a time, passing each row to a nested loop that processes the next table in the join. 简单嵌套循环联接(NLJ)算法从循环中的第一个表中一次读取一行,将每一行传递给一个嵌套循环,该嵌套循环处理联接中的下一个表。This process is repeated as many times as there remain tables to be joined.只要仍有表要联接,此过程就会重复多次。
Assume that a join between three tables 假设三个表t1
, t2
, and t3
is to be executed using the following join types:t1
、t2
和t3
之间的连接将使用以下连接类型执行:
Table Join Type t1 range t2 ref t3 ALL
If a simple NLJ algorithm is used, the join is processed like this:如果使用简单的NLJ算法,则连接的处理方式如下:
for each row in t1 matching range { for each row in t2 matching reference key { for each row in t3 { if row satisfies join conditions, send to client } } }
Because the NLJ algorithm passes rows one at a time from outer loops to inner loops, it typically reads tables processed in the inner loops many times.由于NLJ算法一次将一行从外部循环传递到内部循环,因此它通常会多次读取在内部循环中处理的表。
A Block Nested-Loop (BNL) join algorithm uses buffering of rows read in outer loops to reduce the number of times that tables in inner loops must be read. 块嵌套循环(BNL)联接算法使用外部循环中读取的行的缓冲来减少必须读取内部循环中的表的次数。For example, if 10 rows are read into a buffer and the buffer is passed to the next inner loop, each row read in the inner loop can be compared against all 10 rows in the buffer. 例如,如果将10行读入一个缓冲区并将该缓冲区传递给下一个内部循环,则可以将内部循环中读取的每一行与缓冲区中的所有10行进行比较。This reduces by an order of magnitude the number of times the inner table must be read.这将使必须读取内部表的次数减少一个数量级。
Prior to MySQL 8.0.18, this algorithm was applied for equi-joins when no indexes could be used; in MySQL 8.0.18 and later, the hash join optimization is employed in such cases. 在MySQL 8.0.18之前,该算法适用于无法使用索引的equi连接;在MySQL 8.0.18及更高版本中,在这种情况下使用哈希连接优化。Starting with MySQL 8.0.20, the block nested loop is no longer used by MySQL, and a hash join is employed for in all cases where the block nested loop was used previously. 从MySQL 8.0.20开始,MySQL不再使用块嵌套循环,并且在以前使用块嵌套循环的所有情况下都使用哈希联接。See Section 8.2.1.4, “Hash Join Optimization”.请参阅第8.2.1.4节,“哈希连接优化”。
MySQL join buffering has these characteristics:MySQL连接缓冲具有以下特点:
Join buffering can be used when the join is of type 当联接类型为ALL
or index
(in other words, when no possible keys can be used, and a full scan is done, of either the data or index rows, respectively), or range
. ALL
或index
(换句话说,当无法使用可能的键,并且分别对数据行或索引行进行了完全扫描)或range
时,可以使用联接缓冲。Use of buffering is also applicable to outer joins, as described in Section 8.2.1.12, “Block Nested-Loop and Batched Key Access Joins”.缓冲的使用也适用于外部联接,如第8.2.1.12节,“块嵌套循环和批处理密钥访问联接”所述。
A join buffer is never allocated for the first nonconstant table, even if it would be of type 从不为第一个非恒定表分配联接缓冲区,即使它是ALL
or index
.ALL
或index
类型。
Only columns of interest to a join are stored in its join buffer, not whole rows.只有连接感兴趣的列存储在其连接缓冲区中,而不是整行。
The join_buffer_size
system variable determines the size of each join buffer used to process a query.join_buffer_size
系统变量确定用于处理查询的每个联接缓冲区的大小。
One buffer is allocated for each join that can be buffered, so a given query might be processed using multiple join buffers.为每个可以缓冲的联接分配一个缓冲区,因此可以使用多个联接缓冲区处理给定的查询。
A join buffer is allocated prior to executing the join and freed after the query is done.连接缓冲区在执行连接之前分配,在查询完成后释放。
For the example join described previously for the NLJ algorithm (without buffering), the join is done as follows using join buffering:对于前面为NLJ算法描述的示例连接(无缓冲),使用连接缓冲按如下方式完成连接:
for each row in t1 matching range { for each row in t2 matching reference key { store used columns from t1, t2 in join buffer if buffer is full { for each row in t3 { for each t1, t2 combination in join buffer { if row satisfies join conditions, send to client } } empty join buffer } } } if buffer is not empty { for each row in t3 { for each t1, t2 combination in join buffer { if row satisfies join conditions, send to client } } }
If 如果S
is the size of each stored t1
, t2
combination in the join buffer and C
is the number of combinations in the buffer, the number of times table t3
is scanned is:S
是连接缓冲区中存储的每个t1
、t2
组合的大小,C
是缓冲区中的组合数,则扫描表t3
的次数为:
(S
*C
)/join_buffer_size + 1
The number of t3
scans decreases as the value of join_buffer_size
increases, up to the point when join_buffer_size
is large enough to hold all previous row combinations. t3
扫描的数量随着join_buffer_size
值的增加而减少,直到join_buffer_siz
e足够大以容纳所有以前的行组合为止。At that point, no speed is gained by making it larger.在这一点上,没有速度是通过使它更大。