A semijoin is a preparation-time transformation that enables multiple execution strategies such as table pullout, duplicate weedout, first match, loose scan, and materialization. 半联接是一种准备时转换,它支持多种执行策略,如表拉出、重复weedout、第一次匹配、松散扫描和物化。The optimizer uses semijoin strategies to improve subquery execution, as described in this section.优化器使用半连接策略来改进子查询的执行,如本节所述。
For an inner join between two tables, the join returns a row from one table as many times as there are matches in the other table. 对于两个表之间的内部联接,联接从一个表返回一行的次数与另一个表中的匹配次数相同。But for some questions, the only information that matters is whether there is a match, not the number of matches. 问题是,是否有匹配,而不仅仅是匹配的数量。Suppose that there are tables named 假设有名为class
and roster
that list classes in a course curriculum and class rosters (students enrolled in each class), respectively. class
和roster
的表,分别列出课程表和班级名册(每个班级注册的学生)中的班级。To list the classes that actually have students enrolled, you could use this join:要列出实际有学生注册的班级,您可以使用以下连接:
SELECT class.class_num, class.class_name FROM class INNER JOIN roster WHERE class.class_num = roster.class_num;
However, the result lists each class once for each enrolled student. 然而,结果列出了每个注册学生的每个班级。For the question being asked, this is unnecessary duplication of information.对于所提出的问题,这是不必要的信息重复。
Assuming that 假设class_num
is a primary key in the class
table, duplicate suppression is possible by using SELECT DISTINCT
, but it is inefficient to generate all matching rows first only to eliminate duplicates later.class_num
是class
表中的主键,则可以通过使用SELECT DISTINCT
来抑制重复,但是先生成所有匹配行,然后再消除重复,效率很低。
The same duplicate-free result can be obtained by using a subquery:使用子查询可以获得相同的无重复结果:
SELECT class_num, class_name FROM class WHERE class_num IN (SELECT class_num FROM roster);
Here, the optimizer can recognize that the 在这里,优化器可以识别IN
clause requires the subquery to return only one instance of each class number from the roster
table. IN
子句要求子查询只返回花名册表中每个类号的一个实例。In this case, the query can use a semijoin; that is, an operation that returns only one instance of each row in 在这种情况下,查询可以使用半连接;也就是说,该操作只返回class
that is matched by rows in roster
.class
中与roster
中的行匹配的每一行的一个实例。
The following statement, which contains an 以下语句包含EXISTS
subquery predicate, is equivalent to the previous statement containing an IN
subquery predicate:EXISTS
子查询谓词,与前面包含IN
子查询谓词的语句等效:
SELECT class_num, class_name FROM class WHERE EXISTS (SELECT * FROM roster WHERE class.class_num = roster.class_num);
In MySQL 8.0.16 and later, any statement with an 在MySQL 8.0.16及更高版本中,任何具有EXISTS
subquery predicate is subject to the same semijoin transforms as a statement with an equivalent IN
subquery predicate.EXISTS
子查询谓词的语句都会进行与具有等效IN
子查询谓词的语句相同的半连接转换。
Beginning with MySQL 8.0.17, the following subqueries are transformed into antijoins:从MySQL 8.0.17开始,以下子查询被转换为反联接:
NOT IN (SELECT ... FROM ...)
NOT EXISTS (SELECT ... FROM ...)
.
IN (SELECT ... FROM ...) IS NOT TRUE
EXISTS (SELECT ... FROM ...) IS NOT TRUE
.
IN (SELECT ... FROM ...) IS FALSE
EXISTS (SELECT ... FROM ...) IS FALSE
.
In short, any negation of a subquery of the form 简而言之,对IN (SELECT ... FROM ...)
or EXISTS (SELECT ... FROM ...)
is transformed into an antijoin.IN (SELECT ... FROM ...)
或EXISTS (SELECT ... FROM ...)
形式的子查询的任何取否,都会被转换为反联接。
An antijoin is an operation that returns only rows for which there is no match. 反联接是只返回不匹配行的操作。Consider the query shown here:考虑这里显示的查询:
SELECT class_num, class_name FROM class WHERE class_num NOT IN (SELECT class_num FROM roster);
This query is rewritten internally as the antijoin 该查询在内部重写为反联接SELECT class_num, class_name FROM class ANTIJOIN roster ON class_num
, which returns one instance of each row in class
that is not matched by any rows in roster
. SELECT class_num, class_name FROM class ANTIJOIN roster ON class_num
,它返回class
中未与roster
中任何行匹配的每一行的一个实例。This means that, for each row in 这意味着,对于class
, as soon as a match is found in roster
, the row in class
can be discarded.class
中的每一行,只要在roster
中找到匹配项,就可以丢弃class
中的行。
Antijoin transformations cannot in most cases be applied if the expressions being compared are nullable. 如果要比较的表达式可为null
,则在大多数情况下无法应用反联接转换。An exception to this rule is that 这条规则的一个例外是(... NOT IN (SELECT ...)) IS NOT FALSE
and its equivalent (... IN (SELECT ...)) IS NOT TRUE
can be transformed into antijoins.(... NOT IN (SELECT ...)) IS NOT FALSE
,其等价项(... IN (SELECT ...)) IS NOT TRUE
,可以转换为反联接。
Outer join and inner join syntax is permitted in the outer query specification, and table references may be base tables, derived tables, view references, or common table expressions.外部查询规范中允许使用外部联接和内部联接语法,表引用可以是基表、派生表、视图引用或公共表表达式。
In MySQL, a subquery must satisfy these criteria to be handled as a semijoin (or, in MySQL 8.0.17 and later, an antijoin if 在MySQL中,子查询必须满足以下条件才能作为半联接处理(或者,在MySQL 8.0.17及更高版本中,如果NOT
modifies the subquery):NOT
修改了子查询,则会作为反联接处理):
It must be part of an 它必须是出现在IN
, = ANY
, or EXISTS
predicate that appears at the top level of the WHERE
or ON
clause, possibly as a term in an AND
expression. WHERE
子句或ON
子句顶层的IN
、= ANY
或EXISTS
谓词的一部分,可能作为AND
表达式中的一个术语。For example:例如:
SELECT ... FROM ot1, ... WHERE (oe1, ...) IN (SELECT ie1, ... FROM it1, ... WHERE ...);
Here, 这里,ot_
and i
it_
represent tables in the outer and inner parts of the query, and i
oe_
and i
ie_
represent expressions that refer to columns in the outer and inner tables.i
ot_
和i
it_
表示查询外部和内部的表,i
oe_
和i
ie_
表示引用外部和内部表中列的表达式。i
In MySQL 8.0.17 and later, the subquery can also be the argument to an expression modified by 在MySQL 8.0.17及更高版本中,子查询也可以是由NOT
, IS [NOT] TRUE
, or IS [NOT] FALSE
.NOT
、IS [NOT] TRUE
或IS [NOT] FALSE
修改的表达式的参数。
It must be a single 它必须是没有SELECT
without UNION
constructs.UNION
构造的单个SELECT
。
It must not contain a 它不能包含HAVING
clause.HAVING
子句。
It must not contain any aggregate functions (whether it is explicitly or implicitly grouped).它不能包含任何聚合函数(无论是显式分组还是隐式分组)。
It must not have a 它不能有LIMIT
clause.LIMIT
子句。
The statement must not use the 语句不能在外部查询中使用STRAIGHT_JOIN
join type in the outer query.STRAIGHT_JOIN
联接类型。
The STRAIGHT_JOIN
modifier must not be present.STRAIGHT_JOIN
修饰符不得存在。
The number of outer and inner tables together must be less than the maximum number of tables permitted in a join.外部表和内部表的总数必须小于联接中允许的最大表数。
The subquery may be correlated or uncorrelated. 子查询可以是相关的或不相关的。In MySQL 8.0.16 and later, decorrelation looks at trivially correlated predicates in the 在MySQL 8.0.16及更高版本中,解相关查看子查WHERE
clause of a subquery used as the argument to EXISTS
, and makes it possible to optimize it as if it was used within IN (SELECT b FROM ...)
. 询WHERE
子句中用作EXISTS
参数的平凡相关谓词,并使其能够像在IN (SELECT b FROM ...)
中使用一样进行优化。The term trivially correlated means that the predicate is an equality predicate, that it is the sole predicate in the 术语平凡相关表示谓词是相等谓词,它是WHERE
clause (or is combined with AND
), and that one operand is from a table referenced in the subquery and the other operand is from the outer query block.WHERE
子句中的唯一谓词(或与AND
组合),一个操作数来自子查询中引用的表,另一个操作数来自外部查询块。
The 允许使用DISTINCT
keyword is permitted but ignored. DISTINCT
关键字,但忽略它。Semijoin strategies automatically handle duplicate removal.半连接策略自动处理重复删除。
A 允许但忽略GROUP BY
clause is permitted but ignored, unless the subquery also contains one or more aggregate functions.GROUP BY
子句,除非子查询还包含一个或多个聚合函数。
An 允许使用ORDER BY
clause is permitted but ignored, since ordering is irrelevant to the evaluation of semijoin strategies.ORDER BY
子句,但忽略它,因为排序与半连接策略的评估无关。
If a subquery meets the preceding criteria, MySQL converts it to a semijoin (or, in MySQL 8.0.17 or later, an antijoin if applicable) and makes a cost-based choice from these strategies:如果子查询满足上述条件,MySQL将其转换为半连接(或者,在MySQL 8.0.17或更高版本中,如果适用,转换为反连接),并从以下策略中做出基于成本的选择:
Convert the subquery to a join, or use table pullout and run the query as an inner join between subquery tables and outer tables. 将子查询转换为联接,或者使用表拉出并将查询作为子查询表和外部表之间的内部联接运行。Table pullout pulls a table out from the subquery to the outer query.表拉出将表从子查询拉出到外部查询。
Duplicate Weedout: Run the semijoin as if it was a join and remove duplicate records using a temporary table.像连接一样运行半连接,并使用临时表删除重复记录。
FirstMatch: When scanning the inner tables for row combinations and there are multiple instances of a given value group, choose one rather than returning them all. FirstMatch:在扫描内部表中的行组合时,如果给定值组有多个实例,请选择一个而不是全部返回。This "shortcuts" scanning and eliminates production of unnecessary rows.这种“快捷方式”扫描并消除了不必要行的产生。
LooseScan: Scan a subquery table using an index that enables a single value to be chosen from each subquery's value group.:使用允许从每个子查询的值组中选择单个值的索引扫描子查询表。
Materialize the subquery into an indexed temporary table that is used to perform a join, where the index is used to remove duplicates. 将子查询具体化为一个索引临时表,该临时表用于执行连接,其中索引用于删除重复项。The index might also be used later for lookups when joining the temporary table with the outer tables; if not, the table is scanned. 在将临时表与外部表连接时,该索引也可能稍后用于查找;如果没有,则扫描该表。For more information about materialization, see Section 8.2.2.2, “Optimizing Subqueries with Materialization”.有关物化的更多信息,请参阅第8.2.2.2节,“使用物化优化子查询”。
Each of these strategies can be enabled or disabled using the following 这些策略的每一项都可以使用以下optimizer_switch
system variable flags:optimizer_switch
系统变量标志启用或禁用:
The semijoin
flag controls whether semijoins are used. semijoin
标志控制是否使用半联接。Starting with MySQL 8.0.17, this also applies to antijoins.从MySQL 8.0.17开始,这也适用于反连接。
If 如果启用了semijoin
is enabled, the firstmatch
, loosescan
, duplicateweedout
, and materialization
flags enable finer control over the permitted semijoin strategies.semijoin
,则firstmatch
、loosescan
、duplicateweedout
和materialization
标志可以更好地控制允许的半连接策略。
If the 如果duplicateweedout
semijoin strategy is disabled, it is not used unless all other applicable strategies are also disabled.duplicateweedout
半联接策略被禁用,则除非所有其他适用策略也被禁用,否则不会使用该策略。
If 如果禁用duplicateweedout
is disabled, on occasion the optimizer may generate a query plan that is far from optimal. duplicateweedout
,有时优化器可能会生成一个远远不是最优的查询计划。This occurs due to heuristic pruning during greedy search, which can be avoided by setting 这是由于贪婪搜索期间的启发式修剪造成的,可以通过设置optimizer_prune_level=0
.optimizer_prune_level=0
来避免。
These flags are enabled by default. 默认情况下,这些标志处于启用状态。See Section 8.9.2, “Switchable Optimizations”.请参阅第8.9.2节,“可切换的优化”。
The optimizer minimizes differences in handling of views and derived tables. 优化器将视图和派生表的处理差异最小化。This affects queries that use the 这会影响使用STRAIGHT_JOIN
modifier and a view with an IN
subquery that can be converted to a semijoin. STRAIGHT_JOIN
修饰符的查询以及具有可以转换为半连接的IN
子查询的视图。The following query illustrates this because the change in processing causes a change in transformation, and thus a different execution strategy:以下查询说明了这一点,因为处理过程中的更改会导致转换的更改,从而导致不同的执行策略:
CREATE VIEW v AS SELECT * FROM t1 WHERE a IN (SELECT b FROM t2); SELECT STRAIGHT_JOIN * FROM t3 JOIN v ON t3.x = v.a;
The optimizer first looks at the view and converts the 优化器首先查看视图并将IN
subquery to a semijoin, then checks whether it is possible to merge the view into the outer query. IN
子查询转换为半联接,然后检查是否可以将视图合并到外部查询中。Because the 因为外部查询中的STRAIGHT_JOIN
modifier in the outer query prevents semijoin, the optimizer refuses the merge, causing derived table evaluation using a materialized table.STRAIGHT_JOIN
修饰符阻止半联接,所以优化器拒绝合并,导致使用物化表计算派生表。
EXPLAIN
output indicates the use of semijoin strategies as follows:EXPLAIN
输出表示半连接策略的使用,如下所示:
For extended 对于扩展的EXPLAIN
output, the text displayed by a following SHOW WARNINGS
shows the rewritten query, which displays the semijoin structure. EXPLAIN
输出,下面的SHOW WARNINGS
显示的文本显示重写的查询,该查询显示半联接结构。(See Section 8.8.3, “Extended EXPLAIN Output Format”.) (请参阅第8.8.3节,“扩展EXPLAIN输出格式”。)From this you can get an idea about which tables were pulled out of the semijoin. 从中您可以了解哪些表是从半联接中拉出的。If a subquery was converted to a semijoin, you should see that the subquery predicate is gone and its tables and 如果子查询被转换为半联接,您应该看到子查询谓词消失了,其表和WHERE
clause were merged into the outer query join list and WHERE
clause.WHERE
子句被合并到外部查询联接列表和WHERE
子句中。
Temporary table use for Duplicate Weedout is indicated by 用于重复Weedout的临时表在额外列中由Start temporary
and End temporary
in the Extra
column. Start temporary
和End temporary
指示。Tables that were not pulled out and are in the range of 未被拉出且在EXPLAIN
output rows covered by Start temporary
and End temporary
have their rowid
in the temporary table.Start temporary
和End temporary
所涵盖的EXPLAIN
输出行范围内的表的rowid
位于临时表中。
FirstMatch(
in the tbl_name
)Extra
column indicates join shortcutting.Extra
列中的FirstMatch(
表示连接快捷方式。tbl_name
)
LooseScan(
in the m
..n
)Extra
column indicates use of the LooseScan strategy. Extra
列中的LooseScan(
表示使用松散扫描策略。m
..n
)m
and n
are key part numbers.m
和n
是关键部分编号。
Temporary table use for materialization is indicated by rows with a 用于物化的临时表由select_type
value of MATERIALIZED
and rows with a table
value of <subquery
.N
>select_type
值为MATERIALIZED
的行和table
值为<subquery
的行表示。N
>
In MySQL 8.0.21 and later, a semijoin transformation can also be applied to a single-table 在MySQL 8.0.21及更高版本中,半联接转换也可以应用于使用UPDATE
or DELETE
statement that uses a [NOT] IN
or [NOT] EXISTS
subquery predicate, provided that the statement does not use ORDER BY
or LIMIT
, and that semijoin transformations are allowed by an optimizer hint or by the optimizer_switch
setting.[NOT] IN
或[NOT] EXISTS
子查询谓词的单个表UPDATE
语句或DELETE
语句,前提是该语句不使用ORDER BY
或LIMIT
,并且优化器提示或optimizer_switch
设置允许半联接转换。