13.7.3.1 ANALYZE TABLE Statement语句

ANALYZE [NO_WRITE_TO_BINLOG | LOCAL]
    TABLE tbl_name [, tbl_name] ...

ANALYZE [NO_WRITE_TO_BINLOG | LOCAL]
    TABLE tbl_name
    UPDATE HISTOGRAM ON col_name [, col_name] ...
        [WITH N BUCKETS]

ANALYZE [NO_WRITE_TO_BINLOG | LOCAL]
    TABLE tbl_name
    DROP HISTOGRAM ON col_name [, col_name] ...

ANALYZE TABLE generates table statistics:生成表统计信息:

This statement requires SELECT and INSERT privileges for the table.此语句要求对表具有SELECTINSERT权限。

ANALYZE TABLE works with InnoDB, NDB, and MyISAM tables. ANALYZE TABLE适用于InnoDBNDBMyISAM表。It does not work with views.它不适用于视图。

If the innodb_read_only system variable is enabled, ANALYZE TABLE may fail because it cannot update statistics tables in the data dictionary, which use InnoDB. 如果启用了innodb_read_only系统变量,则ANALYZE TABLE可能会失败,因为它无法更新使用InnoDB的数据字典中的统计表。For ANALYZE TABLE operations that update the key distribution, failure may occur even if the operation updates the table itself (for example, if it is a MyISAM table). 对于更新密钥分发的ANALYZE TABLE操作,即使该操作更新了表本身(例如,如果它是MyISAM表),也可能发生故障。To obtain the updated distribution statistics, set information_schema_stats_expiry=0.要获取更新的分布统计信息,请设置information_schema_stats_expiry=0

ANALYZE TABLE is supported for partitioned tables, and you can use ALTER TABLE ... ANALYZE PARTITION to analyze one or more partitions; for more information, see Section 13.1.9, “ALTER TABLE Statement”, and Section 24.3.4, “Maintenance of Partitions”.分区表支持ANALYZE TABLE,您可以使用ALTER TABLE ... ANALYZE PARTITION分析一个或多个分区;有关更多信息,请参阅第13.1.9节,“ALTER TABLE语句”第24.3.4节,“分区的维护”

During the analysis, the table is locked with a read lock for InnoDB and MyISAM.在分析过程中,表被InnoDBMyISAM的读锁锁定。

ANALYZE TABLE removes the table from the table definition cache, which requires a flush lock. ANALYZE TABLE从表定义缓存中删除该表,这需要一个刷新锁。If there are long running statements or transactions still using the table, subsequent statements and transactions must wait for those operations to finish before the flush lock is released. 如果有长时间运行的语句或事务仍在使用该表,则后续语句和事务必须等待这些操作完成,然后才能释放刷新锁。Because ANALYZE TABLE itself typically finishes quickly, it may not be apparent that delayed transactions or statements involving the same table are due to the remaining flush lock.由于ANALYZE TABLE本身通常很快完成,因此可能不清楚延迟的事务或涉及同一个表的语句是由于剩余的刷新锁造成的。

By default, the server writes ANALYZE TABLE statements to the binary log so that they replicate to replicas. 默认情况下,服务器将ANALYZE TABLE语句写入二进制日志,以便它们复制到副本。To suppress logging, specify the optional NO_WRITE_TO_BINLOG keyword or its alias LOCAL.要抑制日志记录,请指定可选的NO_WRITE_To_BINLOG关键字或其别名LOCAL

ANALYZE TABLE Output输出

ANALYZE TABLE returns a result set with the columns shown in the following table.ANALYZE TABLE返回一个结果集,其列如下表所示。

ColumnValue价值
TableThe table name表名称
Opanalyze or histogram
Msg_typestatus, error, info, note, or warning
Msg_textAn informational message一条信息性消息
Key Distribution Analysis密钥分布分析

ANALYZE TABLE without either HISTOGRAM clause performs a key distribution analysis and stores the distribution for the table or tables. 不带HISTOGRAM子句的ANALYZE TABLE执行键分布分析并存储表的分布。Any existing histogram statistics remain unaffected.任何现有的直方图统计信息都不受影响。

If the table has not changed since the last key distribution analysis, the table is not analyzed again.如果自上次键分布分析以来该表没有更改,则不会再次分析该表。

MySQL uses the stored key distribution to decide the order in which tables should be joined for joins on something other than a constant. MySQL使用存储的密钥分布来决定表的联接顺序,以便对常数以外的对象进行联接。In addition, key distributions can be used when deciding which indexes to use for a specific table within a query.此外,在决定对查询中的特定表使用哪些索引时,可以使用键分布。

To check the stored key distribution cardinality, use the SHOW INDEX statement or the INFORMATION_SCHEMA STATISTICS table. 要检查存储的密钥分布基数,请使用SHOW INDEX语句或INFORMATION_SCHEMA STATISTICS表。See Section 13.7.7.22, “SHOW INDEX Statement”, and Section 26.3.34, “The INFORMATION_SCHEMA STATISTICS Table”.请参阅第13.7.7.22节,“SHOW INDEX语句”第26.3.34节,“INFORMATION_SCHEMA STATISTICS表”

For InnoDB tables, ANALYZE TABLE determines index cardinality by performing random dives on each of the index trees and updating index cardinality estimates accordingly. 对于InnoDB表,ANALYZE TABLE通过对每个索引树执行随机挖掘并相应地更新索引基数估计来确定索引基数。Because these are only estimates, repeated runs of ANALYZE TABLE could produce different numbers. 因为这些只是估计,重复运行ANALYZE TABLE可能会产生不同的数字。This makes ANALYZE TABLE fast on InnoDB tables but not 100% accurate because it does not take all rows into account.这使得ANALYZE TABLEInnoDB表上快速,但不是100%准确,因为它没有考虑所有行。

You can make the statistics collected by ANALYZE TABLE more precise and more stable by enabling innodb_stats_persistent, as explained in Section 15.8.10.1, “Configuring Persistent Optimizer Statistics Parameters”. 第15.8.10.1节,“配置持久优化器统计参数”中所述,通过启用innodb_stats_persistent,可以使ANALYZE TABLE收集的统计数据更加精确和稳定。When innodb_stats_persistent is enabled, it is important to run ANALYZE TABLE after major changes to index column data, as statistics are not recalculated periodically (such as after a server restart).启用innodb_stats_persistent后,在索引列数据发生重大更改后运行ANALYZE TABLE非常重要,因为统计信息不会定期重新计算(例如在服务器重新启动后)。

If innodb_stats_persistent is enabled, you can change the number of random dives by modifying the innodb_stats_persistent_sample_pages system variable. 如果启用了innodb_stats_persistent,则可以通过修改innodb_tats_persistent_sample_pages系统变量来更改随机潜水次数。If innodb_stats_persistent is disabled, modify innodb_stats_transient_sample_pages instead.如果innodb_stats_persistent被禁用,请改为修改innodb_tats_transient_sample_pages

For more information about key distribution analysis in InnoDB, see Section 15.8.10.1, “Configuring Persistent Optimizer Statistics Parameters”, and Section 15.8.10.3, “Estimating ANALYZE TABLE Complexity for InnoDB Tables”.有关InnoDB中密钥分布分析的更多信息,请参阅第15.8.10.1节,“配置持久优化器统计参数”第15.8.10.3节,“估计InnoDB表的ANALYZE表复杂性”

MySQL uses index cardinality estimates in join optimization. MySQL在连接优化中使用索引基数估计。If a join is not optimized in the right way, try running ANALYZE TABLE. 如果联接没有以正确的方式进行优化,请尝试运行ANALYZE TABLEIn the few cases that ANALYZE TABLE does not produce values good enough for your particular tables, you can use FORCE INDEX with your queries to force the use of a particular index, or set the max_seeks_for_key system variable to ensure that MySQL prefers index lookups over table scans. 在少数情况下,ANALYZE TABLE不能为特定的表生成足够好的值,您可以在查询中使用FORCE INDEX来强制使用特定的索引,或者设置max_seeks_for_key系统变量,以确保MySQL更喜欢索引查找而不是表扫描。See Section B.3.5, “Optimizer-Related Issues”.请参阅第B.3.5节,“优化器相关问题”

Histogram Statistics Analysis直方图统计分析

ANALYZE TABLE with the HISTOGRAM clause enables management of histogram statistics for table column values. 使用HISTOGRAM子句的ANALYZE TABLE可以管理表列值的直方图统计信息。For information about histogram statistics, see Section 8.9.6, “Optimizer Statistics”.有关直方图统计信息,请参阅第8.9.6节,“优化器统计”

These histogram operations are available:可以使用以下直方图操作:

  • ANALYZE TABLE with an UPDATE HISTOGRAM clause generates histogram statistics for the named table columns and stores them in the data dictionary. 带有UPDATE HISTOGRAM子句的ANALYZE TABLE生成命名表列的直方图统计信息,并将其存储在数据字典中。Only one table name is permitted for this syntax.此语法只允许使用一个表名。

    The optional WITH N BUCKETS clauses specifies the number of buckets for the histogram. 可选的WITH N BUCKETS子句指定直方图的桶数。The value of N must be an integer in the range from 1 to 1024. N的值必须是介于1到1024之间的整数。If this clause is omitted, the number of buckets is 100.如果省略此条款,则存储桶的数量为100。

  • ANALYZE TABLE with a DROP HISTOGRAM clause removes histogram statistics for the named table columns from the data dictionary. 带有DROP HISTOGRAM子句的ANALYZE TABLE从数据字典中删除命名表列的直方图统计信息。Only one table name is permitted for this syntax.此语法只允许使用一个表名。

Stored histogram management statements affect only the named columns. 存储的直方图管理语句仅影响命名列。Consider these statements:考虑这些陈述:

ANALYZE TABLE t UPDATE HISTOGRAM ON c1, c2, c3 WITH 10 BUCKETS;
ANALYZE TABLE t UPDATE HISTOGRAM ON c1, c3 WITH 10 BUCKETS;
ANALYZE TABLE t DROP HISTOGRAM ON c2;

The first statement updates the histograms for columns c1, c2, and c3, replacing any existing histograms for those columns. 第一条语句更新c1c2c3列的直方图,替换这些列的任何现有直方图。The second statement updates the histograms for c1 and c3, leaving the c2 histogram unaffected. 第二条语句更新c1c3的直方图,使c2直方图不受影响。The third statement removes the histogram for c2, leaving those for c1 and c3 unaffected.第三条语句删除c2的柱状图,使c1c3的柱状图不受影响。

Histogram generation is not supported for encrypted tables (to avoid exposing data in the statistics) or TEMPORARY tables.加密表(以避免在统计数据中暴露数据)或TEMPORARY表不支持直方图生成。

Histogram generation applies to columns of all data types except geometry types (spatial data) and JSON.直方图生成适用于除几何体类型(空间数据)和JSON之外的所有数据类型的列。

Histograms can be generated for stored and virtual generated columns.可以为存储列和虚拟生成列生成直方图。

Histograms cannot be generated for columns that are covered by single-column unique indexes.无法为单列唯一索引覆盖的列生成直方图。

Histogram management statements attempt to perform as much of the requested operation as possible, and report diagnostic messages for the remainder. 直方图管理语句尝试执行尽可能多的请求操作,并报告其余操作的诊断消息。For example, if an UPDATE HISTOGRAM statement names multiple columns, but some of them do not exist or have an unsupported data type, histograms are generated for the other columns, and messages are produced for the invalid columns.例如,如果UPDATE SHORMATIC语句命名了多个列,但其中一些列不存在或数据类型不受支持,则会为其他列生成直方图,并为无效列生成消息。

Histograms are affected by these DDL statements:直方图受以下DDL语句的影响:

  • DROP TABLE removes histograms for columns in the dropped table.删除删除的表中列的直方图。

  • DROP DATABASE removes histograms for any table in the dropped database because the statement drops all tables in the database.删除删除的数据库中任何表的直方图,因为该语句删除数据库中的所有表。

  • RENAME TABLE does not remove histograms. 不删除直方图。Instead, it renames histograms for the renamed table to be associated with the new table name.相反,它将重命名与新表名关联的重命名表的直方图。

  • ALTER TABLE statements that remove or modify a column remove histograms for that column.删除或修改列的语句将删除该列的直方图。

  • ALTER TABLE ... CONVERT TO CHARACTER SET removes histograms for character columns because they are affected by the change of character set. 删除字符列的直方图,因为它们受字符集更改的影响。Histograms for noncharacter columns remain unaffected.非字符列的直方图不受影响。

The histogram_generation_max_mem_size system variable controls the maximum amount of memory available for histogram generation. histogram_generation_max_mem_size系统变量控制可用于生成直方图的最大内存量。The global and session values may be set at runtime.全局值和会话值可以在运行时设置。

Changing the global histogram_generation_max_mem_size value requires privileges sufficient to set global system variables. 更改全局histogram_generation_max_mem_size值需要足够的权限来设置全局系统变量。Changing the session histogram_generation_max_mem_size value requires privileges sufficient to set restricted session system variables. 更改会话histogram_generation_max_mem_size值需要足够的权限来设置受限会话系统变量。See Section 5.1.9.1, “System Variable Privileges”.请参阅第5.1.9.1节,“系统变量权限”

If the estimated amount of data to be read into memory for histogram generation exceeds the limit defined by histogram_generation_max_mem_size, MySQL samples the data rather than reading all of it into memory. 如果估计要读取到内存中用于生成直方图的数据量超过histogram_generation_max_mem_size定义的限制,MySQL将对数据进行采样,而不是将所有数据读取到内存。Sampling is evenly distributed over the entire table. MySQL uses SYSTEM sampling, which is a page-level sampling method.采样均匀地分布在整个表格上。MySQL使用SYSTEM采样,这是一种页面级的采样方法。

The sampling-rate value in the HISTOGRAM column of INFORMATION_SCHEMA.COLUMN_STATISTICS table can be queried to determine the fraction of data that was sampled to create the histogram. 可以查询INFORMATION_SCHEMAOLUMN_STATISTICS表的HISTOGRAM列中的sampling-rate(采样率)值,以确定为创建直方图而采样的数据的分数。The sampling-rate is a number between 0.0 and 1.0. sampling-rate是一个介于0.0和1.0之间的数字。A value of 1 means that all of the data was read (no sampling).值为1表示读取了所有数据(无采样)。

The following example demonstrates sampling. 以下示例演示采样。To ensure that the amount of data exceeds the histogram_generation_max_mem_size limit for the purpose of the example, the limit is set to a low value (2000000 bytes) prior to generating histogram statistics for the birth_date column of the employees table.为了确保数据量超过本示例中的histogram_generation_max_mem_size限制,在为employees表的birth_date列生成直方图统计信息之前,将该限制设置为低值(2000000字节)。

mysql> SET histogram_generation_max_mem_size = 2000000;

mysql> USE employees;

mysql> ANALYZE TABLE employees UPDATE HISTOGRAM ON birth_date WITH 16 BUCKETS\G
*************************** 1. row ***************************
   Table: employees.employees
      Op: histogram
Msg_type: status
Msg_text: Histogram statistics created for column 'birth_date'.

mysql> SELECT HISTOGRAM->>'$."sampling-rate"'
FROM INFORMATION_SCHEMA.COLUMN_STATISTICS
WHERE TABLE_NAME = "employees"
AND COLUMN_NAME = "birth_date";
+---------------------------------+
| HISTOGRAM->>'$."sampling-rate"' |
+---------------------------------+
| 0.0491431208869665              |
+---------------------------------+

A sampling-rate value of 0.0491431208869665 means that approximately 4.9% of the data from the birth_date column was read into memory for generating histogram statistics.sampling-rate(采样率)值为0.0491431208869665意味着birth_date列中约4.9%的数据被读入内存以生成直方图统计。

As of MySQL 8.0.19, the InnoDB storage engine provides its own sampling implementation for data stored in InnoDB tables. 从MySQL 8.0.19开始,InnoDB存储引擎为存储在InnoDB表中的数据提供了自己的采样实现。The default sampling implementation used by MySQL when storage engines do not provide their own requires a full table scan, which is costly for large tables. The InnoDB sampling implementation improves sampling performance by avoiding full table scans.当存储引擎不提供自己的存储引擎时,MySQL使用的默认采样实现需要完整的表扫描,这对于大表来说代价高昂。InnoDB采样实现通过避免全表扫描来提高采样性能。

The sampled_pages_read and sampled_pages_skipped INNODB_METRICS counters can be used to monitor sampling of InnoDB data pages. sampled_pages_readsampled_pages_skipped INNODB_METRICS计数器可用于监控InnoDB数据页的采样。(For general INNODB_METRICS counter usage information, see Section 26.4.23, “The INFORMATION_SCHEMA INNODB_METRICS Table”.)(有关INNODB_METRICS计数器的一般用法信息,请参阅第26.4.23节,“information_SCHEMA InnoDB-METRICS表”。)

The following example demonstrates sampling counter usage, which requires enabling the counters prior to generating histogram statistics.下面的示例演示采样计数器的使用情况,这要求在生成直方图统计信息之前启用计数器。

mysql> SET GLOBAL innodb_monitor_enable = 'sampled%';

mysql> USE employees;

mysql> ANALYZE TABLE employees UPDATE HISTOGRAM ON birth_date WITH 16 BUCKETS\G
*************************** 1. row ***************************
   Table: employees.employees
      Op: histogram
Msg_type: status
Msg_text: Histogram statistics created for column 'birth_date'.

mysql> USE INFORMATION_SCHEMA;

mysql> SELECT NAME, COUNT FROM INNODB_METRICS WHERE NAME LIKE 'sampled%'\G
*************************** 1. row ***************************
 NAME: sampled_pages_read
COUNT: 43
*************************** 2. row ***************************
 NAME: sampled_pages_skipped
COUNT: 843

This formula approximates a sampling rate based on the sampling counter data:此公式根据采样计数器数据近似采样率:

sampling rate = sampled_page_read/(sampled_pages_read + sampled_pages_skipped)

A sampling rate based on sampling counter data is roughly the same as the sampling-rate value in the HISTOGRAM column of INFORMATION_SCHEMA.COLUMN_STATISTICS table.基于采样计数器数据的sampling-rateINFORMATION_SCHEMACOLUMN_STATISTICS表的HISTOGRAM列中的sampling-rate值大致相同。

For information about memory allocations performed for histogram generation, monitor the Performance Schema memory/sql/histograms instrument. See Section 27.12.20.10, “Memory Summary Tables”.有关为生成直方图而执行的内存分配的信息,请监视性能模式memory/sql/histograms仪器。参见第27.12.20.10节,“内存汇总表”

Other Considerations其他考虑

ANALYZE TABLE clears table statistics from the INFORMATION_SCHEMA.INNODB_TABLESTATS table and sets the STATS_INITIALIZED column to Uninitialized. ANALYZE TABLEINFORMATION_SCHEMA.INNODB_TABLESTATS表中清除表统计信息,并将STATS_INITIALIZE列设置为UninitializedStatistics are collected again the next time the table is accessed.下次访问表时会再次收集统计信息。