8.2.2.3 Optimizing Subqueries with the EXISTS Strategy使用EXISTS策略优化子查询

Certain optimizations are applicable to comparisons that use the IN (or =ANY) operator to test subquery results. 某些优化适用于使用IN(或=ANY)运算符测试子查询结果的比较。This section discusses these optimizations, particularly with regard to the challenges that NULL values present. 本节将讨论这些优化,特别是关于NULL值带来的挑战。The last part of the discussion suggests how you can help the optimizer.讨论的最后一部分建议如何帮助优化器。

Consider the following subquery comparison:考虑下面的子查询比较:

outer_expr IN (SELECT inner_expr FROM ... WHERE subquery_where)

MySQL evaluates queries from outside to inside. MySQL“从外部到内部”评估查询。That is, it first obtains the value of the outer expression outer_expr, and then runs the subquery and captures the rows that it produces.也就是说,它首先获取外部表达式outer_expr的值,然后运行子查询并捕获它生成的行。

A very useful optimization is to inform the subquery that the only rows of interest are those where the inner expression inner_expr is equal to outer_expr. 一个非常有用的优化是“通知”子查询,唯一感兴趣的行是那些内部表达式internal_expr等于outer_expr的行。This is done by pushing down an appropriate equality into the subquery's WHERE clause to make it more restrictive. 这是通过将适当的等式下推到子查询的WHERE子句中来实现的,以使其更具限制性。The converted comparison looks like this:转换后的比较如下所示:

EXISTS (SELECT 1 FROM ... WHERE subquery_where AND outer_expr=inner_expr)

After the conversion, MySQL can use the pushed-down equality to limit the number of rows it must examine to evaluate the subquery.转换后,MySQL可以使用下推等式来限制计算子查询时必须检查的行数。

More generally, a comparison of N values to a subquery that returns N-value rows is subject to the same conversion. 更一般地说,将N个值与返回N个值行的子查询进行比较需要进行相同的转换。If oe_i and ie_i represent corresponding outer and inner expression values, this subquery comparison:如果oe_iie_i表示对应的外部和内部表达式值,则此子查询比较:

(oe_1, ..., oe_N) IN
  (SELECT ie_1, ..., ie_N FROM ... WHERE subquery_where)

Becomes:变成:

EXISTS (SELECT 1 FROM ... WHERE subquery_where
                          AND oe_1 = ie_1
                          AND ...
                          AND oe_N = ie_N)

For simplicity, the following discussion assumes a single pair of outer and inner expression values.为简单起见,下面的讨论假设一对外部和内部表达式值。

The pushdown strategy just described works if either of these conditions is true:如果满足以下任一条件,则上述“下推”策略有效:

Suppose that outer_expr is known to be a non-NULL value but the subquery does not produce a row such that outer_expr = inner_expr. 假设已知outer_expr是非NULL值,但子查询不生成outer_expr = inner_expr的行。Then outer_expr IN (SELECT ...) evaluates as follows:然后outer_expr IN (SELECT ...)计算如下:

In this situation, the approach of looking for rows with outer_expr = inner_expr is no longer valid. 在这种情况下,查找outer_expr = inner_expr的行的方法不再有效。It is necessary to look for such rows, but if none are found, also look for rows where inner_expr is NULL. 有必要查找这样的行,但如果找不到任何行,也要查找inner_exprNULL的行。Roughly speaking, the subquery can be converted to something like this:粗略地说,子查询可以转换为如下内容:

EXISTS (SELECT 1 FROM ... WHERE subquery_where AND
        (outer_expr=inner_expr OR inner_expr IS NULL))

The need to evaluate the extra IS NULL condition is why MySQL has the ref_or_null access method:需要评估额外的IS NULL条件,这就是MySQL具有ref_or_null访问方法的原因:

mysql> EXPLAIN
SELECT outer_expr IN (SELECT t2.maybe_null_key
FROM t2, t3 WHERE ...)
FROM t1;
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: t1
...
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: t2
         type: ref_or_null
possible_keys: maybe_null_key
          key: maybe_null_key
      key_len: 5
          ref: func
         rows: 2
        Extra: Using where; Using index
...

The unique_subquery and index_subquery subquery-specific access methods also have or NULL variants.unique_subqueryindex_subquery特定于子查询的访问方法也有“或NULL”变体。

The additional OR ... IS NULL condition makes query execution slightly more complicated (and some optimizations within the subquery become inapplicable), but generally this is tolerable.额外的OR ... IS NULL条件使查询执行稍微复杂一些(并且子查询中的一些优化变得不适用),但通常这是可以接受的。

The situation is much worse when outer_expr can be NULL. outer_expr可以为NULL时,情况更糟。According to the SQL interpretation of NULL as unknown value, NULL IN (SELECT inner_expr ...) should evaluate to:根据NULL作为“未知值”的SQL解释,NULL IN (SELECT inner_expr ...) 的计算结果应为:

For proper evaluation, it is necessary to be able to check whether the SELECT has produced any rows at all, so outer_expr = inner_expr cannot be pushed down into the subquery. 为了进行正确的计算,必须能够检查SELECT是否生成了任何行,因此不能将outer_expr = inner_expr下推到子查询中。This is a problem because many real world subqueries become very slow unless the equality can be pushed down.这是一个问题,因为许多真实世界的子查询变得非常慢,除非可以向下推等式。

Essentially, there must be different ways to execute the subquery depending on the value of outer_expr.本质上,根据outer_expr的值,必须有不同的方式来执行子查询。

The optimizer chooses SQL compliance over speed, so it accounts for the possibility that outer_expr might be NULL:优化器选择SQL遵从性而不是速度,因此它考虑了outer_expr可能为空的可能性:

To solve the dilemma of whether or not to push down conditions into the subquery, the conditions are wrapped within trigger functions. 为了解决是否将条件下推到子查询中的难题,这些条件被包装在“触发器”函数中。Thus, an expression of the following form:因此,以下形式的表达式:

outer_expr IN (SELECT inner_expr FROM ... WHERE subquery_where)

Is converted into:被转换为:

EXISTS (SELECT 1 FROM ... WHERE subquery_where
                          AND trigcond(outer_expr=inner_expr))

More generally, if the subquery comparison is based on several pairs of outer and inner expressions, the conversion takes this comparison:更一般地说,如果子查询比较基于几对外部和内部表达式,则转换将进行此比较:

(oe_1, ..., oe_N) IN (SELECT ie_1, ..., ie_N FROM ... WHERE subquery_where)

And converts it to this expression:并将其转换为以下表达式:

EXISTS (SELECT 1 FROM ... WHERE subquery_where
                          AND trigcond(oe_1=ie_1)
                          AND ...
                          AND trigcond(oe_N=ie_N)
       )

Each trigcond(X) is a special function that evaluates to the following values:每个trigcond(X)是一个特殊函数,其计算结果如下:

Note注意

Trigger functions are not triggers of the kind that you create with CREATE TRIGGER.触发器函数不是使用CREATE TRIGGER创建的触发器。

Equalities that are wrapped within trigcond() functions are not first class predicates for the query optimizer. trigcond()函数中包装的等式不是查询优化器的第一类谓词。Most optimizations cannot deal with predicates that may be turned on and off at query execution time, so they assume any trigcond(X) to be an unknown function and ignore it. 大多数优化无法处理可能在查询执行时打开和关闭的谓词,因此它们假设任何trigcond(X)都是未知函数,并忽略它。Triggered equalities can be used by those optimizations:触发式等式可用于以下优化:

When the optimizer uses a triggered condition to create some kind of index lookup-based access (as for the first two items of the preceding list), it must have a fallback strategy for the case when the condition is turned off. 当优化器使用触发的条件创建某种基于索引查找的访问时(与前面列表的前两项一样),它必须在关闭条件时具有回退策略。This fallback strategy is always the same: Do a full table scan. 此回退策略始终相同:执行完整表扫描。In EXPLAIN output, the fallback shows up as Full scan on NULL key in the Extra column:EXPLAIN输出中,备用项在Extra列中显示为Full scan on NULL key(对空键的完全扫描):

mysql> EXPLAIN SELECT t1.col1,
t1.col1 IN (SELECT t2.key1 FROM t2 WHERE t2.col2=t1.col2) FROM t1\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: t1
        ...
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: t2
         type: index_subquery
possible_keys: key1
          key: key1
      key_len: 5
          ref: func
         rows: 2
        Extra: Using where; Full scan on NULL key

If you run EXPLAIN followed by SHOW WARNINGS, you can see the triggered condition:如果运行EXPLAIN,后面跟着SHOW WARNING,则可以看到触发的条件:

*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: select `test`.`t1`.`col1` AS `col1`,
         <in_optimizer>(`test`.`t1`.`col1`,
         <exists>(<index_lookup>(<cache>(`test`.`t1`.`col1`) in t2
         on key1 checking NULL
         where (`test`.`t2`.`col2` = `test`.`t1`.`col2`) having
         trigcond(<is_not_null_test>(`test`.`t2`.`key1`))))) AS
         `t1.col1 IN (select t2.key1 from t2 where t2.col2=t1.col2)`
         from `test`.`t1`

The use of triggered conditions has some performance implications. 使用触发条件会对性能产生一些影响。A NULL IN (SELECT ...) expression now may cause a full table scan (which is slow) when it previously did not. NULL IN (SELECT ...)表达式现在可能会导致完整表扫描(速度很慢),而以前没有。This is the price paid for correct results (the goal of the trigger-condition strategy is to improve compliance, not speed).这是正确结果的代价(触发条件策略的目标是提高遵从性,而不是速度)。

For multiple-table subqueries, execution of NULL IN (SELECT ...) is particularly slow because the join optimizer does not optimize for the case where the outer expression is NULL. 对于多个表子查询,NULL IN (SELECT ...)的执行速度特别慢,因为联接优化器不会针对外部表达式为NULL的情况进行优化。It assumes that subquery evaluations with NULL on the left side are very rare, even if there are statistics that indicate otherwise. 它假设左侧为NULL的子查询求值非常罕见,即使有统计数据表明情况并非如此。On the other hand, if the outer expression might be NULL but never actually is, there is no performance penalty.另一方面,如果外部表达式可能为NULL,但实际上从未为NULL,则不会有性能损失。

To help the query optimizer better execute your queries, use these suggestions:要帮助查询优化器更好地执行查询,请使用以下建议:

The subquery_materialization_cost_based flag of the optimizer_switch system variable enables control over the choice between subquery materialization and IN-to-EXISTS subquery transformation. optimizer_switch系统变量的subquery_materialization_cost_based标志允许控制子查询具体化和IN-to-EXISTS子查询转换之间的选择。See Section 8.9.2, “Switchable Optimizations”.参见第8.9.2节,“可切换优化”