8.2.2.1 Optimizing IN and EXISTS Subquery Predicates with Semijoin Transformations使用半连接转换优化IN和EXISTS子查询谓词

A semijoin is a preparation-time transformation that enables multiple execution strategies such as table pullout, duplicate weedout, first match, loose scan, and materialization. 半联接是一种准备时转换,它支持多种执行策略,如表拉出、重复weedout、第一次匹配、松散扫描和物化。The optimizer uses semijoin strategies to improve subquery execution, as described in this section.优化器使用半连接策略来改进子查询的执行,如本节所述。

For an inner join between two tables, the join returns a row from one table as many times as there are matches in the other table. 对于两个表之间的内部联接,联接从一个表返回一行的次数与另一个表中的匹配次数相同。But for some questions, the only information that matters is whether there is a match, not the number of matches. 问题是,是否有匹配,而不仅仅是匹配的数量。Suppose that there are tables named class and roster that list classes in a course curriculum and class rosters (students enrolled in each class), respectively. 假设有名为classroster的表,分别列出课程表和班级名册(每个班级注册的学生)中的班级。To list the classes that actually have students enrolled, you could use this join:要列出实际有学生注册的班级,您可以使用以下连接:

SELECT class.class_num, class.class_name
    FROM class
    INNER JOIN roster
    WHERE class.class_num = roster.class_num;

However, the result lists each class once for each enrolled student. 然而,结果列出了每个注册学生的每个班级。For the question being asked, this is unnecessary duplication of information.对于所提出的问题,这是不必要的信息重复。

Assuming that class_num is a primary key in the class table, duplicate suppression is possible by using SELECT DISTINCT, but it is inefficient to generate all matching rows first only to eliminate duplicates later.假设class_numclass表中的主键,则可以通过使用SELECT DISTINCT来抑制重复,但是先生成所有匹配行,然后再消除重复,效率很低。

The same duplicate-free result can be obtained by using a subquery:使用子查询可以获得相同的无重复结果:

SELECT class_num, class_name
    FROM class
    WHERE class_num IN
        (SELECT class_num FROM roster);

Here, the optimizer can recognize that the IN clause requires the subquery to return only one instance of each class number from the roster table. 在这里,优化器可以识别IN子句要求子查询只返回花名册表中每个类号的一个实例。In this case, the query can use a semijoin; that is, an operation that returns only one instance of each row in class that is matched by rows in roster.在这种情况下,查询可以使用半连接;也就是说,该操作只返回class中与roster中的行匹配的每一行的一个实例。

The following statement, which contains an EXISTS subquery predicate, is equivalent to the previous statement containing an IN subquery predicate:以下语句包含EXISTS子查询谓词,与前面包含IN子查询谓词的语句等效:

SELECT class_num, class_name
    FROM class
    WHERE EXISTS
        (SELECT * FROM roster WHERE class.class_num = roster.class_num);

In MySQL 8.0.16 and later, any statement with an EXISTS subquery predicate is subject to the same semijoin transforms as a statement with an equivalent IN subquery predicate.在MySQL 8.0.16及更高版本中,任何具有EXISTS子查询谓词的语句都会进行与具有等效IN子查询谓词的语句相同的半连接转换。

Beginning with MySQL 8.0.17, the following subqueries are transformed into antijoins:从MySQL 8.0.17开始,以下子查询被转换为反联接:

In short, any negation of a subquery of the form IN (SELECT ... FROM ...) or EXISTS (SELECT ... FROM ...) is transformed into an antijoin.简而言之,对IN (SELECT ... FROM ...)EXISTS (SELECT ... FROM ...)形式的子查询的任何取否,都会被转换为反联接。

An antijoin is an operation that returns only rows for which there is no match. 反联接是只返回不匹配行的操作。Consider the query shown here:考虑这里显示的查询:

SELECT class_num, class_name
    FROM class
    WHERE class_num NOT IN
        (SELECT class_num FROM roster);

This query is rewritten internally as the antijoin SELECT class_num, class_name FROM class ANTIJOIN roster ON class_num, which returns one instance of each row in class that is not matched by any rows in roster. 该查询在内部重写为反联接SELECT class_num, class_name FROM class ANTIJOIN roster ON class_num,它返回class中未与roster中任何行匹配的每一行的一个实例。This means that, for each row in class, as soon as a match is found in roster, the row in class can be discarded.这意味着,对于class中的每一行,只要在roster中找到匹配项,就可以丢弃class中的行。

Antijoin transformations cannot in most cases be applied if the expressions being compared are nullable. 如果要比较的表达式可为null,则在大多数情况下无法应用反联接转换。An exception to this rule is that (... NOT IN (SELECT ...)) IS NOT FALSE and its equivalent (... IN (SELECT ...)) IS NOT TRUE can be transformed into antijoins.这条规则的一个例外是(... NOT IN (SELECT ...)) IS NOT FALSE,其等价项(... IN (SELECT ...)) IS NOT TRUE,可以转换为反联接。

Outer join and inner join syntax is permitted in the outer query specification, and table references may be base tables, derived tables, view references, or common table expressions.外部查询规范中允许使用外部联接和内部联接语法,表引用可以是基表、派生表、视图引用或公共表表达式。

In MySQL, a subquery must satisfy these criteria to be handled as a semijoin (or, in MySQL 8.0.17 and later, an antijoin if NOT modifies the subquery):在MySQL中,子查询必须满足以下条件才能作为半联接处理(或者,在MySQL 8.0.17及更高版本中,如果NOT修改了子查询,则会作为反联接处理):

If a subquery meets the preceding criteria, MySQL converts it to a semijoin (or, in MySQL 8.0.17 or later, an antijoin if applicable) and makes a cost-based choice from these strategies:如果子查询满足上述条件,MySQL将其转换为半连接(或者,在MySQL 8.0.17或更高版本中,如果适用,转换为反连接),并从以下策略中做出基于成本的选择:

Each of these strategies can be enabled or disabled using the following optimizer_switch system variable flags:这些策略的每一项都可以使用以下optimizer_switch系统变量标志启用或禁用:

These flags are enabled by default. 默认情况下,这些标志处于启用状态。See Section 8.9.2, “Switchable Optimizations”.请参阅第8.9.2节,“可切换的优化”

The optimizer minimizes differences in handling of views and derived tables. 优化器将视图和派生表的处理差异最小化。This affects queries that use the STRAIGHT_JOIN modifier and a view with an IN subquery that can be converted to a semijoin. 这会影响使用STRAIGHT_JOIN修饰符的查询以及具有可以转换为半连接的IN子查询的视图。The following query illustrates this because the change in processing causes a change in transformation, and thus a different execution strategy:以下查询说明了这一点,因为处理过程中的更改会导致转换的更改,从而导致不同的执行策略:

CREATE VIEW v AS
SELECT *
FROM t1
WHERE a IN (SELECT b
           FROM t2);

SELECT STRAIGHT_JOIN *
FROM t3 JOIN v ON t3.x = v.a;

The optimizer first looks at the view and converts the IN subquery to a semijoin, then checks whether it is possible to merge the view into the outer query. 优化器首先查看视图并将IN子查询转换为半联接,然后检查是否可以将视图合并到外部查询中。Because the STRAIGHT_JOIN modifier in the outer query prevents semijoin, the optimizer refuses the merge, causing derived table evaluation using a materialized table.因为外部查询中的STRAIGHT_JOIN修饰符阻止半联接,所以优化器拒绝合并,导致使用物化表计算派生表。

EXPLAIN output indicates the use of semijoin strategies as follows:EXPLAIN输出表示半连接策略的使用,如下所示:

In MySQL 8.0.21 and later, a semijoin transformation can also be applied to a single-table UPDATE or DELETE statement that uses a [NOT] IN or [NOT] EXISTS subquery predicate, provided that the statement does not use ORDER BY or LIMIT, and that semijoin transformations are allowed by an optimizer hint or by the optimizer_switch setting.在MySQL 8.0.21及更高版本中,半联接转换也可以应用于使用[NOT] IN[NOT] EXISTS子查询谓词的单个表UPDATE语句或DELETE语句,前提是该语句不使用ORDER BYLIMIT,并且优化器提示或optimizer_switch设置允许半联接转换。