8.2.1.13 Condition Filtering条件筛选

In join processing, prefix rows are those rows passed from one table in a join to the next. 在联接处理中,前缀行是从联接中的一个表传递到下一个表的行。In general, the optimizer attempts to put tables with low prefix counts early in the join order to keep the number of row combinations from increasing rapidly. 通常,优化器会尝试在联接的早期放置前缀计数较低的表,以防止行组合的数量快速增加。To the extent that the optimizer can use information about conditions on rows selected from one table and passed to the next, the more accurately it can compute row estimates and choose the best execution plan.优化器可以使用从一个表中选择并传递到下一个表的行上的条件信息,因此它可以更准确地计算行估计并选择最佳执行计划。

Without condition filtering, the prefix row count for a table is based on the estimated number of rows selected by the WHERE clause according to whichever access method the optimizer chooses. 在没有条件筛选的情况下,表的前缀行数基于WHERE子句根据优化器选择的访问方法选择的估计行数。Condition filtering enables the optimizer to use other relevant conditions in the WHERE clause not taken into account by the access method, and thus improve its prefix row count estimates. 条件筛选使优化器能够在WHERE子句中使用访问方法未考虑的其他相关条件,从而改进其前缀行数估计。For example, even though there might be an index-based access method that can be used to select rows from the current table in a join, there might also be additional conditions for the table in the WHERE clause that can filter (further restrict) the estimate for qualifying rows passed to the next table.例如,即使可能有一种基于索引的访问方法可用于从联接中的当前表中选择行,WHERE子句中的表也可能有其他条件,可以过滤(进一步限制)传递到下一个表的符合条件行的估计。

A condition contributes to the filtering estimate only if:只有在以下情况下,条件才有助于过滤估计:

In EXPLAIN output, the rows column indicates the row estimate for the chosen access method, and the filtered column reflects the effect of condition filtering. EXPLAIN输出中,rows列表示所选访问方法的行估计值,筛选列反映条件筛选的效果。filtered values are expressed as percentages. filtered值以百分比表示。The maximum value is 100, which means no filtering of rows occurred. 最大值为100,这意味着没有对行进行筛选。Values decreasing from 100 indicate increasing amounts of filtering.值从100减少表示过滤量增加。

The prefix row count (the number of rows estimated to be passed from the current table in a join to the next) is the product of the rows and filtered values. 前缀行数(在联接中从当前表传递到下一个表的估计行数)是rowsfiltered值的乘积。That is, the prefix row count is the estimated row count, reduced by the estimated filtering effect. 也就是说,前缀行计数是估计的行计数,由估计的过滤效果减少。For example, if rows is 1000 and filtered is 20%, condition filtering reduces the estimated row count of 1000 to a prefix row count of 1000 × 20% = 1000 × .2 = 200.例如,如果rows为1000,而filtered为20%,则条件筛选会将估计的行数1000减少为前缀行数1000 × 20%=1000×.2=200。

Consider the following query:考虑下面的查询:

SELECT *
  FROM employee JOIN department ON employee.dept_no = department.dept_no
  WHERE employee.first_name = 'John'
  AND employee.hire_date BETWEEN '2018-01-01' AND '2018-06-01';

Suppose that the data set has these characteristics:假设数据集具有以下特征:

Without condition filtering, EXPLAIN produces output like this:如果没有条件筛选,EXPLAIN将生成如下输出:

+----+------------+--------+------------------+---------+---------+------+----------+
| id | table      | type   | possible_keys    | key     | ref     | rows | filtered |
+----+------------+--------+------------------+---------+---------+------+----------+
| 1  | employee   | ref    | name,h_date,dept | name    | const   | 8    | 100.00   |
| 1  | department | eq_ref | PRIMARY          | PRIMARY | dept_no | 1    | 100.00   |
+----+------------+--------+------------------+---------+---------+------+----------+

For employee, the access method on the name index picks up the 8 rows that match a name of 'John'. 对于employeename索引上的访问方法会选择与名称'John'匹配的8行。No filtering is done (filtered is 100%), so all rows are prefix rows for the next table: The prefix row count is rows × filtered = 8 × 100% = 8.未进行筛选(filtered为100%),因此所有行都是下一个表的前缀行:前缀行计数为rows × filtered=8 × 100%=8。

With condition filtering, the optimizer additionally takes into account conditions from the WHERE clause not taken into account by the access method. 通过条件筛选,优化器还将考虑访问方法未考虑的WHERE子句中的条件。In this case, the optimizer uses heuristics to estimate a filtering effect of 16.31% for the BETWEEN condition on employee.hire_date. 在这种情况下,优化器使用启发式方法估计employee.hire_date上的BETWEEN条件的筛选效果为16.31%。As a result, EXPLAIN produces output like this:因此,EXPLAIN生成如下输出:

+----+------------+--------+------------------+---------+---------+------+----------+
| id | table      | type   | possible_keys    | key     | ref     | rows | filtered |
+----+------------+--------+------------------+---------+---------+------+----------+
| 1  | employee   | ref    | name,h_date,dept | name    | const   | 8    | 16.31    |
| 1  | department | eq_ref | PRIMARY          | PRIMARY | dept_no | 1    | 100.00   |
+----+------------+--------+------------------+---------+---------+------+----------+

Now the prefix row count is rows × filtered = 8 × 16.31% = 1.3, which more closely reflects actual data set.现在,前缀行计数是rows × filtered=8 × 16.31%=1.3,这更能反映实际数据集。

Normally, the optimizer does not calculate the condition filtering effect (prefix row count reduction) for the last joined table because there is no next table to pass rows to. 通常,优化器不会计算上一个联接表的条件筛选效果(前缀行数减少),因为没有下一个要向其传递行的表。An exception occurs for EXPLAIN: To provide more information, the filtering effect is calculated for all joined tables, including the last one.EXPLAIN出现异常:为了提供更多信息,将对所有联接的表(包括最后一个表)计算过滤效果。

To control whether the optimizer considers additional filtering conditions, use the condition_fanout_filter flag of the optimizer_switch system variable (see Section 8.9.2, “Switchable Optimizations”). 要控制优化器是否考虑其他过滤条件,请使用optimizer_switch变量的condition_fanout_filter标志(请参阅第8.9.2节,“可切换的优化”)。This flag is enabled by default but can be disabled to suppress condition filtering (for example, if a particular query is found to yield better performance without it).默认情况下启用该标志,但可以禁用该标志以抑制条件筛选(例如,如果发现某个特定查询在没有条件筛选的情况下可以产生更好的性能)。

If the optimizer overestimates the effect of condition filtering, performance may be worse than if condition filtering is not used. 如果优化器高估了条件筛选的效果,则性能可能会比不使用条件筛选时更差。In such cases, these techniques may help:在这种情况下,这些技术可能有助于: