Certain optimizations are applicable to comparisons that use the 某些优化适用于使用IN
(or =ANY
) operator to test subquery results. IN
(或=ANY
)运算符测试子查询结果的比较。This section discusses these optimizations, particularly with regard to the challenges that 本节将讨论这些优化,特别是关于NULL
values present. NULL
值带来的挑战。The last part of the discussion suggests how you can help the optimizer.讨论的最后一部分建议如何帮助优化器。
Consider the following subquery comparison:考虑下面的子查询比较:
outer_expr
IN (SELECTinner_expr
FROM ... WHEREsubquery_where
)
MySQL evaluates queries “from outside to inside.” MySQL“从外部到内部”评估查询。That is, it first obtains the value of the outer expression 也就是说,它首先获取外部表达式outer_expr
, and then runs the subquery and captures the rows that it produces.outer_expr
的值,然后运行子查询并捕获它生成的行。
A very useful optimization is to “inform” the subquery that the only rows of interest are those where the inner expression 一个非常有用的优化是“通知”子查询,唯一感兴趣的行是那些内部表达式inner_expr
is equal to outer_expr
. internal_expr
等于outer_expr
的行。This is done by pushing down an appropriate equality into the subquery's 这是通过将适当的等式下推到子查询的WHERE
clause to make it more restrictive. WHERE
子句中来实现的,以使其更具限制性。The converted comparison looks like this:转换后的比较如下所示:
EXISTS (SELECT 1 FROM ... WHEREsubquery_where
ANDouter_expr
=inner_expr
)
After the conversion, MySQL can use the pushed-down equality to limit the number of rows it must examine to evaluate the subquery.转换后,MySQL可以使用下推等式来限制计算子查询时必须检查的行数。
More generally, a comparison of 更一般地说,将N
values to a subquery that returns N
-value rows is subject to the same conversion. N
个值与返回N
个值行的子查询进行比较需要进行相同的转换。If 如果oe_i
and ie_i
represent corresponding outer and inner expression values, this subquery comparison:oe_i
和ie_i
表示对应的外部和内部表达式值,则此子查询比较:
(oe_1
, ...,oe_N
) IN (SELECTie_1
, ...,ie_N
FROM ... WHEREsubquery_where
)
Becomes:变成:
EXISTS (SELECT 1 FROM ... WHEREsubquery_where
ANDoe_1
=ie_1
AND ... ANDoe_N
=ie_N
)
For simplicity, the following discussion assumes a single pair of outer and inner expression values.为简单起见,下面的讨论假设一对外部和内部表达式值。
The “pushdown” strategy just described works if either of these conditions is true:如果满足以下任一条件,则上述“下推”策略有效:
outer_expr
and inner_expr
cannot be NULL
.outer_expr
和inner_expr
不能是NULL
。
You need not distinguish 您不需要区分NULL
from FALSE
subquery results. NULL
和FALSE
子查询结果。If the subquery is a part of an 如果子查询是OR
or AND
expression in the WHERE
clause, MySQL assumes that you do not care. WHERE
子句中OR
或AND
表达式的一部分,MySQL假定您不关心。Another instance where the optimizer notices that 优化器注意到不需要区分NULL
and FALSE
subquery results need not be distinguished is this construct:NULL
和FALSE
子查询结果的另一个实例是以下构造:
... WHEREouter_expr
IN (subquery
)
In this case, the 在这种情况下,无论WHERE
clause rejects the row whether IN (
returns subquery
)NULL
or FALSE
.IN (
返回subquery
)NULL
还是FALSE
,WHERE
子句都拒绝该行。
Suppose that 假设已知outer_expr
is known to be a non-NULL
value but the subquery does not produce a row such that outer_expr
= inner_expr
. outer_expr
是非NULL
值,但子查询不生成outer_expr
= inner_expr
的行。Then 然后
evaluates as follows:outer_expr
IN (SELECT ...)
计算如下:outer_expr
IN (SELECT ...)
In this situation, the approach of looking for rows with 在这种情况下,查找
is no longer valid. outer_expr
= inner_expr
的行的方法不再有效。outer_expr
= inner_expr
It is necessary to look for such rows, but if none are found, also look for rows where 有必要查找这样的行,但如果找不到任何行,也要查找inner_expr
is NULL
. inner_expr
为NULL
的行。Roughly speaking, the subquery can be converted to something like this:粗略地说,子查询可以转换为如下内容:
EXISTS (SELECT 1 FROM ... WHEREsubquery_where
AND (outer_expr
=inner_expr
ORinner_expr
IS NULL))
The need to evaluate the extra 需要评估额外的IS NULL
condition is why MySQL has the ref_or_null
access method:IS NULL
条件,这就是MySQL具有ref_or_null
访问方法的原因:
mysql>EXPLAIN
SELECT
outer_expr
IN (SELECT t2.maybe_null_keyFROM t2, t3 WHERE ...)
FROM t1;
*************************** 1. row *************************** id: 1 select_type: PRIMARY table: t1 ... *************************** 2. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: t2 type: ref_or_null possible_keys: maybe_null_key key: maybe_null_key key_len: 5 ref: func rows: 2 Extra: Using where; Using index ...
The unique_subquery
and index_subquery
subquery-specific access methods also have “or NULL
” variants.unique_subquery
和index_subquery
特定于子查询的访问方法也有“或NULL
”变体。
The additional 额外的OR ... IS NULL
condition makes query execution slightly more complicated (and some optimizations within the subquery become inapplicable), but generally this is tolerable.OR ... IS NULL
条件使查询执行稍微复杂一些(并且子查询中的一些优化变得不适用),但通常这是可以接受的。
The situation is much worse when 当outer_expr
can be NULL
. outer_expr
可以为NULL
时,情况更糟。According to the SQL interpretation of 根据NULL
as “unknown value,” NULL IN (SELECT
should evaluate to:inner_expr
...)NULL
作为“未知值”的SQL解释,NULL IN (SELECT
的计算结果应为:inner_expr
...)
For proper evaluation, it is necessary to be able to check whether the 为了进行正确的计算,必须能够检查SELECT
has produced any rows at all, so
cannot be pushed down into the subquery. outer_expr
= inner_expr
SELECT
是否生成了任何行,因此不能将
下推到子查询中。outer_expr
= inner_expr
This is a problem because many real world subqueries become very slow unless the equality can be pushed down.这是一个问题,因为许多真实世界的子查询变得非常慢,除非可以向下推等式。
Essentially, there must be different ways to execute the subquery depending on the value of 本质上,根据outer_expr
.outer_expr
的值,必须有不同的方式来执行子查询。
The optimizer chooses SQL compliance over speed, so it accounts for the possibility that 优化器选择SQL遵从性而不是速度,因此它考虑了outer_expr
might be NULL
:outer_expr
可能为空的可能性:
If 如果outer_expr
is NULL
, to evaluate the following expression, it is necessary to execute the SELECT
to determine whether it produces any rows:outer_expr
为NULL
,则要计算以下表达式,必须执行SELECT
以确定它是否生成任何行:
NULL IN (SELECTinner_expr
FROM ... WHEREsubquery_where
)
It is necessary to execute the original 有必要在这里执行原始SELECT
here, without any pushed-down equalities of the kind mentioned previously.SELECT
,而不使用前面提到的任何下推等式。
On the other hand, when 另一方面,当outer_expr
is not NULL
, it is absolutely essential that this comparison:outer_expr
不为NULL
时,此比较绝对必要:
outer_expr
IN (SELECTinner_expr
FROM ... WHEREsubquery_where
)
Be converted to this expression that uses a pushed-down condition:将转换为使用下推条件的表达式:
EXISTS (SELECT 1 FROM ... WHEREsubquery_where
ANDouter_expr
=inner_expr
)
Without this conversion, subqueries are slow.如果没有这种转换,子查询速度会很慢。
To solve the dilemma of whether or not to push down conditions into the subquery, the conditions are wrapped within “trigger” functions. 为了解决是否将条件下推到子查询中的难题,这些条件被包装在“触发器”函数中。Thus, an expression of the following form:因此,以下形式的表达式:
outer_expr
IN (SELECTinner_expr
FROM ... WHEREsubquery_where
)
Is converted into:被转换为:
EXISTS (SELECT 1 FROM ... WHEREsubquery_where
AND trigcond(outer_expr
=inner_expr
))
More generally, if the subquery comparison is based on several pairs of outer and inner expressions, the conversion takes this comparison:更一般地说,如果子查询比较基于几对外部和内部表达式,则转换将进行此比较:
(oe_1
, ...,oe_N
) IN (SELECTie_1
, ...,ie_N
FROM ... WHEREsubquery_where
)
And converts it to this expression:并将其转换为以下表达式:
EXISTS (SELECT 1 FROM ... WHEREsubquery_where
AND trigcond(oe_1
=ie_1
) AND ... AND trigcond(oe_N
=ie_N
) )
Each 每个trigcond(
is a special function that evaluates to the following values:X
)trigcond(
是一个特殊函数,其计算结果如下:X
)
当“链接”外部表达式X
when the “linked” outer expression oe_i
is not NULL
oe_i
不为NULL
时为X
当“链接”外部表达式TRUE
when the “linked” outer expression oe_i
is NULL
oe_i
为NULL
时为TRUE
Trigger functions are not triggers of the kind that you create with 触发器函数不是使用CREATE TRIGGER
.CREATE TRIGGER
创建的触发器。
Equalities that are wrapped within trigcond()
functions are not first class predicates for the query optimizer. trigcond()
函数中包装的等式不是查询优化器的第一类谓词。Most optimizations cannot deal with predicates that may be turned on and off at query execution time, so they assume any 大多数优化无法处理可能在查询执行时打开和关闭的谓词,因此它们假设任何trigcond(
to be an unknown function and ignore it. X
)trigcond(
都是未知函数,并忽略它。X
)Triggered equalities can be used by those optimizations:触发式等式可用于以下优化:
Reference optimizations: 引用优化:trigcond(
can be used to construct X
=Y
[OR Y
IS NULL])ref
, eq_ref
, or ref_or_null
table accesses.trigcond(
可用于构造X
=Y
[OR Y
IS NULL])ref
、eq_ref
或ref_or_null
表访问。
Index lookup-based subquery execution engines: 基于索引查找的子查询执行引擎:trigcond(
can be used to construct X
=Y
)unique_subquery
or index_subquery
accesses.trigcond(
可用于构造X
=Y
)unique_subquery
或index_subquery
访问。
Table-condition generator: If the subquery is a join of several tables, the triggered condition is checked as soon as possible.表条件生成器:如果子查询是多个表的联接,则会尽快检查触发的条件。
When the optimizer uses a triggered condition to create some kind of index lookup-based access (as for the first two items of the preceding list), it must have a fallback strategy for the case when the condition is turned off. 当优化器使用触发的条件创建某种基于索引查找的访问时(与前面列表的前两项一样),它必须在关闭条件时具有回退策略。This fallback strategy is always the same: Do a full table scan. 此回退策略始终相同:执行完整表扫描。In 在EXPLAIN
output, the fallback shows up as Full scan on NULL key
in the Extra
column:EXPLAIN
输出中,备用项在Extra
列中显示为Full scan on NULL key
(对空键的完全扫描):
mysql>EXPLAIN SELECT t1.col1,
t1.col1 IN (SELECT t2.key1 FROM t2 WHERE t2.col2=t1.col2) FROM t1\G
*************************** 1. row *************************** id: 1 select_type: PRIMARY table: t1 ... *************************** 2. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: t2 type: index_subquery possible_keys: key1 key: key1 key_len: 5 ref: func rows: 2 Extra: Using where; Full scan on NULL key
If you run 如果运行EXPLAIN
followed by SHOW WARNINGS
, you can see the triggered condition:EXPLAIN
,后面跟着SHOW WARNING
,则可以看到触发的条件:
*************************** 1. row *************************** Level: Note Code: 1003 Message: select `test`.`t1`.`col1` AS `col1`, <in_optimizer>(`test`.`t1`.`col1`, <exists>(<index_lookup>(<cache>(`test`.`t1`.`col1`) in t2 on key1 checking NULL where (`test`.`t2`.`col2` = `test`.`t1`.`col2`) having trigcond(<is_not_null_test>(`test`.`t2`.`key1`))))) AS `t1.col1 IN (select t2.key1 from t2 where t2.col2=t1.col2)` from `test`.`t1`
The use of triggered conditions has some performance implications. 使用触发条件会对性能产生一些影响。A NULL IN (SELECT ...)
expression now may cause a full table scan (which is slow) when it previously did not. NULL IN (SELECT ...)
表达式现在可能会导致完整表扫描(速度很慢),而以前没有。This is the price paid for correct results (the goal of the trigger-condition strategy is to improve compliance, not speed).这是正确结果的代价(触发条件策略的目标是提高遵从性,而不是速度)。
For multiple-table subqueries, execution of 对于多个表子查询,NULL IN (SELECT ...)
is particularly slow because the join optimizer does not optimize for the case where the outer expression is NULL
. NULL IN (SELECT ...)
的执行速度特别慢,因为联接优化器不会针对外部表达式为NULL
的情况进行优化。It assumes that subquery evaluations with 它假设左侧为NULL
on the left side are very rare, even if there are statistics that indicate otherwise. NULL
的子查询求值非常罕见,即使有统计数据表明情况并非如此。On the other hand, if the outer expression might be 另一方面,如果外部表达式可能为NULL
but never actually is, there is no performance penalty.NULL
,但实际上从未为NULL
,则不会有性能损失。
To help the query optimizer better execute your queries, use these suggestions:要帮助查询优化器更好地执行查询,请使用以下建议:
Declare a column as 如果列确实为非空,则将其声明为NOT NULL
if it really is. NOT NULL
。This also helps other aspects of the optimizer by simplifying condition testing for the column.通过简化列的条件测试,这也有助于优化器的其他方面。
If you need not distinguish a 如果不需要区分NULL
from FALSE
subquery result, you can easily avoid the slow execution path. NULL
和FALSE
子查询结果,可以轻松避免执行路径缓慢。Replace a comparison that looks like this:将如下所示的比较:
outer_expr
[NOT] IN (SELECTinner_expr
FROM ...)
with this expression:替换为此表达式:
(outer_expr
IS NOT NULL) AND (outer_expr
[NOT] IN (SELECTinner_expr
FROM ...))
Then 然后NULL IN (SELECT ...)
is never evaluated because MySQL stops evaluating AND
parts as soon as the expression result is clear.NULL IN (SELECT ...)
永远不会被计算,因为MySQL会在表达式结果清楚后之立即停止计算AND
部分。
Another possible rewrite:另一种可能的重写:
[NOT] EXISTS (SELECTinner_expr
FROM ... WHEREinner_expr
=outer_expr
)
The subquery_materialization_cost_based
flag of the optimizer_switch
system variable enables control over the choice between subquery materialization and IN
-to-EXISTS
subquery transformation. optimizer_switch
系统变量的subquery_materialization_cost_based
标志允许控制子查询具体化和IN
-to-EXISTS
子查询转换之间的选择。See Section 8.9.2, “Switchable Optimizations”.参见第8.9.2节,“可切换优化”。