Chapter 8 Optimization

8.2.2.3 Optimizing Subqueries with the EXISTS Strategy使用EXISTS策略优化子查询

~~Certain optimizations are applicable to comparisons that use the IN (or =ANY) operator to test subquery results.~~ 某些优化适用于使用IN（或=ANY）运算符测试子查询结果的比较。~~This section discusses these optimizations, particularly with regard to the challenges that NULL values present.~~ 本节将讨论这些优化，特别是关于NULL值带来的挑战。~~The last part of the discussion suggests how you can help the optimizer.~~讨论的最后一部分建议如何帮助优化器。

~~Consider the following subquery comparison:~~考虑下面的子查询比较：

outer_expr IN (SELECT inner_expr FROM ... WHERE subquery_where)

~~MySQL evaluates queries “from outside to inside.”~~ MySQL“从外部到内部”评估查询。~~That is, it first obtains the value of the outer expression outer_expr, and then runs the subquery and captures the rows that it produces.~~也就是说，它首先获取外部表达式outer_expr的值，然后运行子查询并捕获它生成的行。

~~A very useful optimization is to “inform” the subquery that the only rows of interest are those where the inner expression inner_expr is equal to outer_expr.~~ 一个非常有用的优化是“通知”子查询，唯一感兴趣的行是那些内部表达式internal_expr等于outer_expr的行。~~This is done by pushing down an appropriate equality into the subquery's WHERE clause to make it more restrictive.~~ 这是通过将适当的等式下推到子查询的WHERE子句中来实现的，以使其更具限制性。~~The converted comparison looks like this:~~转换后的比较如下所示：

EXISTS (SELECT 1 FROM ... WHERE subquery_where AND outer_expr=inner_expr)

~~After the conversion, MySQL can use the pushed-down equality to limit the number of rows it must examine to evaluate the subquery.~~转换后，MySQL可以使用下推等式来限制计算子查询时必须检查的行数。

~~More generally, a comparison of N values to a subquery that returns N-value rows is subject to the same conversion.~~ 更一般地说，将N个值与返回N个值行的子查询进行比较需要进行相同的转换。~~If oe_i and ie_i represent corresponding outer and inner expression values, this subquery comparison:~~如果oe_i和ie_i表示对应的外部和内部表达式值，则此子查询比较：

(oe_1, ..., oe_N) IN
  (SELECT ie_1, ..., ie_N FROM ... WHERE subquery_where)

~~Becomes:~~变成：

EXISTS (SELECT 1 FROM ... WHERE subquery_where
                          AND oe_1 = ie_1
                          AND ...
                          AND oe_N = ie_N)

~~For simplicity, the following discussion assumes a single pair of outer and inner expression values.~~为简单起见，下面的讨论假设一对外部和内部表达式值。

~~The “pushdown” strategy just described works if either of these conditions is true:~~如果满足以下任一条件，则上述“下推”策略有效：

outer_expr and inner_expr cannot be NULL.outer_expr和inner_expr不能是NULL。
~~You need not distinguish NULL from FALSE subquery results.~~ 您不需要区分NULL和FALSE子查询结果。~~If the subquery is a part of an OR or AND expression in the WHERE clause, MySQL assumes that you do not care.~~ 如果子查询是WHERE子句中OR或AND表达式的一部分，MySQL假定您不关心。~~Another instance where the optimizer notices that NULL and FALSE subquery results need not be distinguished is this construct:~~优化器注意到不需要区分NULL和FALSE子查询结果的另一个实例是以下构造：
```
... WHERE outer_expr IN (subquery)
```
~~In this case, the WHERE clause rejects the row whether IN (subquery) returns NULL or FALSE.~~在这种情况下，无论IN (subquery)返回NULL还是FALSE，WHERE子句都拒绝该行。

~~Suppose that outer_expr is known to be a non-NULL value but the subquery does not produce a row such that outer_expr = inner_expr.~~ 假设已知outer_expr是非NULL值，但子查询不生成outer_expr = inner_expr的行。~~Then outer_expr IN (SELECT ...) evaluates as follows:~~然后outer_expr IN (SELECT ...)计算如下：

~~NULL, if the SELECT produces any row where inner_expr is NULL~~如果SELECT生成的任何行其中inner_expr为NULL，则为NULL
~~FALSE, if the SELECT produces only non-NULL values or produces nothing~~如果SELECT仅生成非NULL值或不生成任何值，则为FALSE

~~In this situation, the approach of looking for rows with outer_expr = inner_expr is no longer valid.~~ 在这种情况下，查找outer_expr = inner_expr的行的方法不再有效。~~It is necessary to look for such rows, but if none are found, also look for rows where inner_expr is NULL.~~ 有必要查找这样的行，但如果找不到任何行，也要查找inner_expr为NULL的行。~~Roughly speaking, the subquery can be converted to something like this:~~粗略地说，子查询可以转换为如下内容：

EXISTS (SELECT 1 FROM ... WHERE subquery_where AND
        (outer_expr=inner_expr OR inner_expr IS NULL))

~~The need to evaluate the extra IS NULL condition is why MySQL has the ref_or_null access method:~~需要评估额外的IS NULL条件，这就是MySQL具有ref_or_null访问方法的原因：

mysql> EXPLAIN
SELECT outer_expr IN (SELECT t2.maybe_null_key
FROM t2, t3 WHERE ...)
FROM t1;
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: t1
...
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: t2
         type: ref_or_null
possible_keys: maybe_null_key
          key: maybe_null_key
      key_len: 5
          ref: func
         rows: 2
        Extra: Using where; Using index
...

~~The unique_subquery and index_subquery subquery-specific access methods also have “or NULL” variants.~~unique_subquery和index_subquery特定于子查询的访问方法也有“或NULL”变体。

~~The additional OR ... IS NULL condition makes query execution slightly more complicated (and some optimizations within the subquery become inapplicable), but generally this is tolerable.~~额外的OR ... IS NULL条件使查询执行稍微复杂一些（并且子查询中的一些优化变得不适用），但通常这是可以接受的。

~~The situation is much worse when outer_expr can be NULL.~~ 当outer_expr可以为NULL时，情况更糟。~~According to the SQL interpretation of NULL as “unknown value,” NULL IN (SELECT inner_expr ...) should evaluate to:~~根据NULL作为“未知值”的SQL解释，NULL IN (SELECT inner_expr ...) 的计算结果应为：

~~NULL, if the SELECT produces any rows~~如果SELECT生成任何行，则为NULL
~~FALSE, if the SELECT produces no rows~~如果SELECT不生成行，则为FALSE

~~For proper evaluation, it is necessary to be able to check whether the SELECT has produced any rows at all, so outer_expr = inner_expr cannot be pushed down into the subquery.~~ 为了进行正确的计算，必须能够检查SELECT是否生成了任何行，因此不能将outer_expr = inner_expr下推到子查询中。~~This is a problem because many real world subqueries become very slow unless the equality can be pushed down.~~这是一个问题，因为许多真实世界的子查询变得非常慢，除非可以向下推等式。

~~Essentially, there must be different ways to execute the subquery depending on the value of outer_expr.~~本质上，根据outer_expr的值，必须有不同的方式来执行子查询。

~~The optimizer chooses SQL compliance over speed, so it accounts for the possibility that outer_expr might be NULL:~~优化器选择SQL遵从性而不是速度，因此它考虑了outer_expr可能为空的可能性：

~~If outer_expr is NULL, to evaluate the following expression, it is necessary to execute the SELECT to determine whether it produces any rows:~~如果outer_expr为NULL，则要计算以下表达式，必须执行SELECT以确定它是否生成任何行：
```
NULL IN (SELECT inner_expr FROM ... WHERE subquery_where)
```
~~It is necessary to execute the original SELECT here, without any pushed-down equalities of the kind mentioned previously.~~有必要在这里执行原始SELECT，而不使用前面提到的任何下推等式。
~~On the other hand, when outer_expr is not NULL, it is absolutely essential that this comparison:~~另一方面，当outer_expr不为NULL时，此比较绝对必要：
```
outer_expr IN (SELECT inner_expr FROM ... WHERE subquery_where)
```
~~Be converted to this expression that uses a pushed-down condition:~~将转换为使用下推条件的表达式：
```
EXISTS (SELECT 1 FROM ... WHERE subquery_where AND outer_expr=inner_expr)
```
~~Without this conversion, subqueries are slow.~~如果没有这种转换，子查询速度会很慢。

~~To solve the dilemma of whether or not to push down conditions into the subquery, the conditions are wrapped within “trigger” functions.~~ 为了解决是否将条件下推到子查询中的难题，这些条件被包装在“触发器”函数中。~~Thus, an expression of the following form:~~因此，以下形式的表达式：

outer_expr IN (SELECT inner_expr FROM ... WHERE subquery_where)

~~Is converted into:~~被转换为：

EXISTS (SELECT 1 FROM ... WHERE subquery_where
                          AND trigcond(outer_expr=inner_expr))

~~More generally, if the subquery comparison is based on several pairs of outer and inner expressions, the conversion takes this comparison:~~更一般地说，如果子查询比较基于几对外部和内部表达式，则转换将进行此比较：

(oe_1, ..., oe_N) IN (SELECT ie_1, ..., ie_N FROM ... WHERE subquery_where)

~~And converts it to this expression:~~并将其转换为以下表达式：

EXISTS (SELECT 1 FROM ... WHERE subquery_where
                          AND trigcond(oe_1=ie_1)
                          AND ...
                          AND trigcond(oe_N=ie_N)
       )

~~Each trigcond(X) is a special function that evaluates to the following values:~~每个trigcond(X)是一个特殊函数，其计算结果如下：

X when the “linked” outer expression oe_i is not NULL当“链接”外部表达式oe_i不为NULL时为X
~~TRUE when the “linked” outer expression oe_i is NULL~~当“链接”外部表达式oe_i为NULL时为TRUE

~~Note~~注意

~~Trigger functions are not triggers of the kind that you create with CREATE TRIGGER.~~触发器函数不是使用CREATE TRIGGER创建的触发器。

~~Equalities that are wrapped within trigcond() functions are not first class predicates for the query optimizer.~~ trigcond()函数中包装的等式不是查询优化器的第一类谓词。~~Most optimizations cannot deal with predicates that may be turned on and off at query execution time, so they assume any trigcond(X) to be an unknown function and ignore it.~~ 大多数优化无法处理可能在查询执行时打开和关闭的谓词，因此它们假设任何trigcond(X)都是未知函数，并忽略它。~~Triggered equalities can be used by those optimizations:~~触发式等式可用于以下优化：

~~Reference optimizations: trigcond(X=Y [OR Y IS NULL]) can be used to construct ref, eq_ref, or ref_or_null table accesses.~~引用优化：trigcond(X=Y [OR Y IS NULL])可用于构造ref、eq_ref或ref_or_null表访问。
~~Index lookup-based subquery execution engines: trigcond(X=Y) can be used to construct unique_subquery or index_subquery accesses.~~基于索引查找的子查询执行引擎：trigcond(X=Y)可用于构造unique_subquery或index_subquery访问。
~~Table-condition generator: If the subquery is a join of several tables, the triggered condition is checked as soon as possible.~~表条件生成器：如果子查询是多个表的联接，则会尽快检查触发的条件。

When the optimizer uses a triggered condition to create some kind of index lookup-based access (as for the first two items of the preceding list), it must have a fallback strategy for the case when the condition is turned off. 当优化器使用触发的条件创建某种基于索引查找的访问时（与前面列表的前两项一样），它必须在关闭条件时具有回退策略。~~This fallback strategy is always the same: Do a full table scan.~~ 此回退策略始终相同：执行完整表扫描。~~In EXPLAIN output, the fallback shows up as Full scan on NULL key in the Extra column:~~在EXPLAIN输出中，备用项在Extra列中显示为Full scan on NULL key（对空键的完全扫描）：

mysql> EXPLAIN SELECT t1.col1,
t1.col1 IN (SELECT t2.key1 FROM t2 WHERE t2.col2=t1.col2) FROM t1\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: t1
        ...
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: t2
         type: index_subquery
possible_keys: key1
          key: key1
      key_len: 5
          ref: func
         rows: 2
        Extra: Using where; Full scan on NULL key

~~If you run EXPLAIN followed by SHOW WARNINGS, you can see the triggered condition:~~如果运行EXPLAIN，后面跟着SHOW WARNING，则可以看到触发的条件：

*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: select `test`.`t1`.`col1` AS `col1`,
         <in_optimizer>(`test`.`t1`.`col1`,
         <exists>(<index_lookup>(<cache>(`test`.`t1`.`col1`) in t2
         on key1 checking NULL
         where (`test`.`t2`.`col2` = `test`.`t1`.`col2`) having
         trigcond(<is_not_null_test>(`test`.`t2`.`key1`))))) AS
         `t1.col1 IN (select t2.key1 from t2 where t2.col2=t1.col2)`
         from `test`.`t1`

~~The use of triggered conditions has some performance implications.~~ 使用触发条件会对性能产生一些影响。~~A NULL IN (SELECT ...) expression now may cause a full table scan (which is slow) when it previously did not.~~ NULL IN (SELECT ...)表达式现在可能会导致完整表扫描（速度很慢），而以前没有。~~This is the price paid for correct results (the goal of the trigger-condition strategy is to improve compliance, not speed).~~这是正确结果的代价（触发条件策略的目标是提高遵从性，而不是速度）。

~~For multiple-table subqueries, execution of NULL IN (SELECT ...) is particularly slow because the join optimizer does not optimize for the case where the outer expression is NULL.~~ 对于多个表子查询，NULL IN (SELECT ...)的执行速度特别慢，因为联接优化器不会针对外部表达式为NULL的情况进行优化。~~It assumes that subquery evaluations with NULL on the left side are very rare, even if there are statistics that indicate otherwise.~~ 它假设左侧为NULL的子查询求值非常罕见，即使有统计数据表明情况并非如此。~~On the other hand, if the outer expression might be NULL but never actually is, there is no performance penalty.~~另一方面，如果外部表达式可能为NULL，但实际上从未为NULL，则不会有性能损失。

~~To help the query optimizer better execute your queries, use these suggestions:~~要帮助查询优化器更好地执行查询，请使用以下建议：

~~Declare a column as NOT NULL if it really is.~~ 如果列确实为非空，则将其声明为NOT NULL。~~This also helps other aspects of the optimizer by simplifying condition testing for the column.~~通过简化列的条件测试，这也有助于优化器的其他方面。
~~If you need not distinguish a NULL from FALSE subquery result, you can easily avoid the slow execution path.~~ 如果不需要区分NULL和FALSE子查询结果，可以轻松避免执行路径缓慢。~~Replace a comparison that looks like this:~~将如下所示的比较：
```
outer_expr [NOT] IN (SELECT inner_expr FROM ...)
```
~~with this expression:~~替换为此表达式：
```
(outer_expr IS NOT NULL) AND (outer_expr [NOT] IN (SELECT inner_expr FROM ...))
```
~~Then NULL IN (SELECT ...) is never evaluated because MySQL stops evaluating AND parts as soon as the expression result is clear.~~然后NULL IN (SELECT ...)永远不会被计算，因为MySQL会在表达式结果清楚后之立即停止计算AND部分。

~~Another possible rewrite:~~另一种可能的重写：
```
[NOT] EXISTS (SELECT inner_expr FROM ...
        WHERE inner_expr=outer_expr)
```

~~The subquery_materialization_cost_based flag of the optimizer_switch system variable enables control over the choice between subquery materialization and IN-to-EXISTS subquery transformation.~~ optimizer_switch系统变量的subquery_materialization_cost_based标志允许控制子查询具体化和IN-to-EXISTS子查询转换之间的选择。~~See Section 8.9.2, “Switchable Optimizations”.~~参见第8.9.2节，“可切换优化”。