赞
踩
在一条单表查询语句真正执行之前,MySQL的查询优化器会找出执行该语句所有可能使用的方案,对比之后找出成本最低的方案,这个成本最低的方案就是所谓的执行计划,之后才会调用存储引擎提供的接口真正的执行查询。
employees数据库来自MySQL官方示例数据库employees。
mysql> use employees;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> create table t_emp like employees;
Query OK, 0 rows affected (0.03 sec)
mysql> insert into t_emp select * from employees;
Query OK, 300024 rows affected (2.57 sec)
Records: 300024 Duplicates: 0 Warnings: 0
mysql> alter table t_emp add index idx_hire_date(hire_date);
Query OK, 0 rows affected (0.02 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> alter table t_emp add index idx_birth_date(birth_date);
Query OK, 0 rows affected (0.01 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> show index from t_emp;
+-------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| t_emp | 0 | PRIMARY | 1 | emp_no | A | 299645 | NULL | NULL | | BTREE | | |
| t_emp | 1 | idx_hire_date | 1 | hire_date | A | 5590 | NULL | NULL | | BTREE | | |
| t_emp | 1 | idx_birth_date | 1 | birth_date | A | 4770 | NULL | NULL | | BTREE | | |
+-------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.00 sec)
待分析的SQL:
mysql> explain select * from t_emp where hire_date > '1990-11-20' and birth_date > '1840-11-20';
+----+-------------+-------+------------+------+------------------------------+------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+------------------------------+------+---------+------+--------+----------+-------------+
| 1 | SIMPLE | t_emp | NULL | ALL | idx_hire_date,idx_birth_date | NULL | NULL | NULL | 299645 | 25.00 | Using where |
+----+-------------+-------+------------+------+------------------------------+------+---------+------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
为什么走的是全表扫描,而不是索引idx_hire_date或idx_birth_date呢?
根据搜索条件,找出所有可能使用的索引
计算全表扫描的代价
计算使用不同索引执行查询的代价
对比各种执行方案的代价,找出成本最低的那个
全表扫描即将聚簇索引从对应的页面加载到内存,然后检测记录是否满足条件计算。
计算公式:页面数 x 1 + 记录数 x 0.2
页面数和记录数如何获得?查看表的统计信息。
mysql> show table status like 't_emp'\G;
*************************** 1. row ***************************
Name: t_emp
Engine: InnoDB
Version: 10
Row_format: Dynamic
Rows: 299645
Avg_row_length: 50
Data_length: 15220736
Max_data_length: 0
Index_length: 9469952
Data_free: 2097152
Auto_increment: NULL
Create_time: 2021-08-19 07:19:44
Update_time: 2021-08-19 07:19:31
Check_time: NULL
Collation: latin1_swedish_ci
Checksum: NULL
Create_options:
Comment:
1 row in set (0.00 sec)
从上面可以看到Rows: 299645
,记录数为299645,数据的大小为Data_length=15220736
,页面数=数据大小/16/1024=929。
从下面的统计表的数据中也可以直接看出记录数和页面数:
mysql> select * from mysql.innodb_table_stats where table_name='t_emp'\G;
*************************** 1. row ***************************
database_name: employees
table_name: t_emp
last_update: 2021-08-19 07:19:44
n_rows: 299645
clustered_index_size: 929
sum_of_other_index_sizes: 578
1 row in set (0.00 sec)
因此全表扫描的总成本为:929x1.0+299645x0.2=60858。
来看一下MySQL算的是多少:
mysql> explain format=json select * from t_emp where hire_date > '1990-11-20' and birth_date > '1840-11-20';
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "60858.00"
},
"table": {
"table_name": "t_emp",
"access_type": "ALL",
"possible_keys": [
"idx_hire_date",
"idx_birth_date"
],
"rows_examined_per_scan": 299645,
"rows_produced_per_join": 74910,
"filtered": "25.00",
"cost_info": {
"read_cost": "45875.85",
"eval_cost": "14982.15",
"prefix_cost": "60858.00",
"data_read_per_join": "3M"
},
"used_columns": [
"emp_no",
"birth_date",
"first_name",
"last_name",
"gender",
"hire_date"
],
"attached_condition": "((`employees`.`t_emp`.`hire_date` > '1990-11-20') and (`employees`.`t_emp`.`birth_date` > '1840-11-20'))"
}
}
} |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)
计算二级索引成本的大步骤为:
计算搜索二级索引树中满足条件的记录
回表查询
搜索二级索引树代价计算:
IO成本:范围区间占用页面数量
,查询优化器粗暴的认为读取索引的一个范围区间的I/O成本和读取一个页面是相同的。
CPU成本:需要回表的记录数。
如何计算idx_hire_date在hire_date > '1990-11-20’这个范围区间中包含多少二级索引记录
:
当左边界记录和右边界记录相隔较小时
,直接遍历这些页面的PAGE HEADER中的PAGE_N_RECS字段(记录该页面有多少条数据)可以获得精确值。
当左边界记录和右边界记录相隔较大时
,从左边界所在页面向右读取10个页面,计算平均每个页面中包含多少记录。再乘以左边界和右边界之间的页面数量即可
,左边界和右边界之间的页面数量可以从B+树中的父节点获取(若跨越太多页面则需要递归)。
这里我们可以用count(*)计算一下:
mysql> select count(*) from t_emp where hire_date > '1990-11-20';
+----------+
| count(*) |
+----------+
| 112374 |
+----------+
1 row in set (0.05 sec)
搜索二级索引树的成本:1x1.0+112374x0.2=22475.8
回表查询代价计算
:回表查询聚簇索引需加载的页面数量(用于计算IO成本)
,设计MySQL的大叔评估回表操作的I/O成本依旧很豪放,他们认为每次回表操作都相当于访问一个页面
,回表查询定位页面后,需要定位记录并且判断其他过滤条件(用于计算CPU成本)。
回表的成本:112374x1.0+112374x0.2=134848.8
总成本:22475.8+134848.8=157324.6
再来看一下MySQL算的是多少:
mysql> explain format=json select * from t_emp force index(idx_hire_date) where hire_date > '1990-11-20' and birth_date > '1840-11-20';
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "209751.81"
},
"table": {
"table_name": "t_emp",
"access_type": "range",
"possible_keys": [
"idx_hire_date"
],
"key": "idx_hire_date",
"used_key_parts": [
"hire_date"
],
"key_length": "3",
"rows_examined_per_scan": 149822,
"rows_produced_per_join": 49935,
"filtered": "33.33",
"index_condition": "(`employees`.`t_emp`.`hire_date` > '1990-11-20')",
"cost_info": {
"read_cost": "199764.68",
"eval_cost": "9987.13",
"prefix_cost": "209751.81",
"data_read_per_join": "2M"
},
"used_columns": [
"emp_no",
"birth_date",
"first_name",
"last_name",
"gender",
"hire_date"
],
"attached_condition": "(`employees`.`t_emp`.`birth_date` > '1840-11-20')"
}
}
} |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)
为什么和我们算的不一样?因为MySQL使用的是统计数据,计算出来的记录数为149822。
总成本=1 x 1.0 + 149822 x 0.2 + 149822 x 1.0 + 149822 x 0.2 = 209751.8
二级索引idx_birth_date的成本的计算方式类似。
记录数:
mysql> select count(*) from t_emp where birth_date > '1840-11-20';
+----------+
| count(*) |
+----------+
| 300024 |
+----------+
1 row in set (0.08 sec)
二级索引扫描成本:1 x 1.0 + 300024 x 0.2=60005.8
回表成本:300024 x 1.0 + 300024 x 0.2=360,028.8
总成本:420034.6
通过对比各种执行方案的代价,找出成本最低的那个为全表扫描。
看下MySQL算出来的成本:
mysql> explain format=json select * from t_emp force index(idx_birth_date) where hire_date > '1990-11-20' and birth_date > '1840-11-20';
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "209751.81"
},
"table": {
"table_name": "t_emp",
"access_type": "range",
"possible_keys": [
"idx_birth_date"
],
"key": "idx_birth_date",
"used_key_parts": [
"birth_date"
],
"key_length": "3",
"rows_examined_per_scan": 149822,
"rows_produced_per_join": 49935,
"filtered": "33.33",
"index_condition": "(`employees`.`t_emp`.`birth_date` > '1840-11-20')",
"cost_info": {
"read_cost": "199764.68",
"eval_cost": "9987.13",
"prefix_cost": "209751.81",
"data_read_per_join": "2M"
},
"used_columns": [
"emp_no",
"birth_date",
"first_name",
"last_name",
"gender",
"hire_date"
],
"attached_condition": "(`employees`.`t_emp`.`hire_date` > '1990-11-20')"
}
}
} |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)
这个索引可以看出MySQL的统计数据很不准确。
上面只能看到MySQL选择的方案的成本,要想看到MySQL对SQL的执行过程可以使用optimizer_trace来查看。
参数介绍:
QUERY:跟踪语句的文本。
TRACE:跟踪,JSON格式。
MISSING_BYTES_BEYOND_MAX_MEM_SIZE:每个记住的跟踪都是一个字符串,随着优化的进行扩展并将其附加数据。该optimizer_trace_max_mem_size 变量设置所有当前记忆的跟踪所使用的内存总量的限制。如果达到此限制,则当前跟踪不会扩展(因此是不完整的),并且该MISSING_BYTES_BEYOND_MAX_MEM_SIZE列显示该跟踪丢失的字节数。
INSUFFICIENT_PRIVILEGES:如果跟踪的查询使用SQL SECURITY值为的视图或存储的例程DEFINER,则可能是拒绝了除定义者之外的其他用户查看查询的跟踪。在这种情况下,跟踪显示为空,INSUFFICIENT_PRIVILEGES值为1。否则值为0。
我们可以对SQL进行optimizer_trace :
-- optimizer_trace默认不开启,开启会影响性能,用完记得关闭
mysql> show variables like '%optimizer_trace%';
+------------------------------+----------------------------------------------------------------------------+
| Variable_name | Value |
+------------------------------+----------------------------------------------------------------------------+
| optimizer_trace | enabled=off,one_line=off |
| optimizer_trace_features | greedy_search=on,range_optimizer=on,dynamic_range=on,repeated_subselect=on |
| optimizer_trace_limit | 1 |
| optimizer_trace_max_mem_size | 16384 |
| optimizer_trace_offset | -1 |
+------------------------------+----------------------------------------------------------------------------+
5 rows in set (0.01 sec)
-- 设置开启optimizer_trace
mysql> set optimizer_trace="enabled=on";
Query OK, 0 rows affected (0.00 sec)
mysql> explain select * from t_emp where hire_date > '1990-11-20' and birth_date > '1840-11-20';
+----+-------------+-------+------------+------+------------------------------+------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+------------------------------+------+---------+------+--------+----------+-------------+
| 1 | SIMPLE | t_emp | NULL | ALL | idx_hire_date,idx_birth_date | NULL | NULL | NULL | 299468 | 25.00 | Using where |
+----+-------------+-------+------------+------+------------------------------+------+---------+------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> select * from INFORMATION_SCHEMA.OPTIMIZER_TRACE\G;
*************************** 1. row ***************************
QUERY: explain select * from t_emp where hire_date > '1990-11-20' and birth_date > '1840-11-20'
TRACE: {
"steps": [
{
"join_preparation": {
"select#": 1,
"steps": [
{
"expanded_query": "/* select#1 */ select `t_emp`.`emp_no` AS `emp_no`,`t_emp`.`birth_date` AS `birth_date`,`t_emp`.`first_name` AS `first_name`,`t_emp`.`last_name` AS `last_name`,`t_emp`.`gender` AS `gender`,`t_emp`.`hire_date` AS `hire_date` from `t_emp` where ((`t_emp`.`hire_date` > '1990-11-20') and (`t_emp`.`birth_date` > '1840-11-20'))"
}
]
}
},
{
"join_optimization": {
"select#": 1,
"steps": [
{
"condition_processing": {
"condition": "WHERE",
"original_condition": "((`t_emp`.`hire_date` > '1990-11-20') and (`t_emp`.`birth_date` > '1840-11-20'))",
"steps": [
{
"transformation": "equality_propagation",
"resulting_condition": "((`t_emp`.`hire_date` > '1990-11-20') and (`t_emp`.`birth_date` > '1840-11-20'))"
},
{
"transformation": "constant_propagation",
"resulting_condition": "((`t_emp`.`hire_date` > '1990-11-20') and (`t_emp`.`birth_date` > '1840-11-20'))"
},
{
"transformation": "trivial_condition_removal",
"resulting_condition": "((`t_emp`.`hire_date` > '1990-11-20') and (`t_emp`.`birth_date` > '1840-11-20'))"
}
]
}
},
{
"substitute_generated_columns": {
}
},
{
"table_dependencies": [
{
"table": "`t_emp`",
"row_may_be_null": false,
"map_bit": 0,
"depends_on_map_bits": [
]
}
]
},
{
"ref_optimizer_key_uses": [
]
},
{
"rows_estimation": [
{
"table": "`t_emp`",
"range_analysis": {
"table_scan": {
"rows": 299645,
"cost": 60860
},
"potential_range_indexes": [
{
"index": "PRIMARY",
"usable": false,
"cause": "not_applicable"
},
{
"index": "idx_hire_date",
"usable": true,
"key_parts": [
"hire_date",
"emp_no"
]
},
{
"index": "idx_birth_date",
"usable": true,
"key_parts": [
"birth_date",
"emp_no"
]
}
],
"setup_range_conditions": [
],
"group_index_range": {
"chosen": false,
"cause": "not_group_by_or_distinct"
},
"analyzing_range_alternatives": {
"range_scan_alternatives": [
{
"index": "idx_hire_date",
"ranges": [
"0x748d0f < hire_date"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": false,
"using_mrr": false,
"index_only": false,
"rows": 149822,
"cost": 179787,
"chosen": false,
"cause": "cost"
},
{
"index": "idx_birth_date",
"ranges": [
"0x74610e < birth_date"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": false,
"using_mrr": false,
"index_only": false,
"rows": 149822,
"cost": 179787,
"chosen": false,
"cause": "cost"
}
],
"analyzing_roworder_intersect": {
"usable": false,
"cause": "too_few_roworder_scans"
}
}
}
}
]
},
{
"considered_execution_plans": [
{
"plan_prefix": [
],
"table": "`t_emp`",
"best_access_path": {
"considered_access_paths": [
{
"rows_to_scan": 299645,
"access_type": "scan",
"resulting_rows": 74911,
"cost": 60858,
"chosen": true
}
]
},
"condition_filtering_pct": 100,
"rows_for_plan": 74911,
"cost_for_plan": 60858,
"chosen": true
}
]
},
{
"attaching_conditions_to_tables": {
"original_condition": "((`t_emp`.`hire_date` > '1990-11-20') and (`t_emp`.`birth_date` > '1840-11-20'))",
"attached_conditions_computation": [
],
"attached_conditions_summary": [
{
"table": "`t_emp`",
"attached": "((`t_emp`.`hire_date` > '1990-11-20') and (`t_emp`.`birth_date` > '1840-11-20'))"
}
]
}
},
{
"refine_plan": [
{
"table": "`t_emp`"
}
]
}
]
}
},
{
"join_explain": {
"select#": 1,
"steps": [
]
}
}
]
}
MISSING_BYTES_BEYOND_MAX_MEM_SIZE: 0
INSUFFICIENT_PRIVILEGES: 0
1 row in set (0.00 sec)
细心的你肯定会发现通过optimizer_trace打印的idx_hire_date成本与explain打印的不一致,差别在于回表的CPU成本。
explain:总成本=1 x 1.0 + 149822 x 0.2 + 149822 x 1.0 + 149822 x 0.2 = 209751.8
optimizer_trace:总成本=1 x 1.0 + 149822 x 0.2 + 149822 x 1.0 = 179787.4
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。