赞
踩
PostgreSQL , limit , order by , 优化器 , 选择性 , 相关性 , 数据存储顺序 , 目标数据存储顺序
当我们在执行一个这样的SQL时,假如有这样几个索引(c1,c2) (id),数据库到底该用哪个索引呢?
- explain select * from tbl where c1=200 and c2=200 order by id limit 10;
- 或
- explain select * from tbl where c1=200 and c2 between 100 and 300 order by id limit 10;
实际上PG会通过计算成本得到应该使用哪个索引。
但是实际上计算公式本身的通用性,使得当C1,C2数据分布倾斜时,可能导致某些C1,C2输入值的执行计划不准确。
走c1,c2索引,当满足c1,c2条件的记录很少时。如果数据比较多,则SORT耗费的成本就较大,导致整个耗时变大。
走id索引,当满足c1,c2条件的记录很多时,(并且c1,c2满足条件的数据分布在id偏小的端),如果分布在大端,则需要扫描更多的记录才能找到满足条件的记录。
如果数据库统计信息中可以评估出来满足条件的c1,c2的记录与ID顺序分布的关系,也可以作为优化器计算成本的一种参考输入(当然要做到这个可能比较复杂,同时也会增加优化器计算成本的开销)。
下面通过实例来说明。
1、建测试表
- postgres=# create table tbl (id int, c1 int, c2 int, c3 int, c4 int);
- CREATE TABLE
2、写入一批随机数据,ID从1到1000万。
- postgres=# insert into tbl select generate_series(1,10000000), random()*100, random()*100, random()*100, random()*100;
- INSERT 0 10000000
3、写入另一批100万条数据,c1,c2 与前面1000万的值不一样。
- postgres=# insert into tbl select generate_series(10000001,11000000), 200,200,200,200;
- INSERT 0 1000000
4、创建两个索引,也就是本文需要重点关注的,到底走哪个索引更划算
- postgres=# create index idx_tbl_1 on tbl(id);
- CREATE INDEX
- postgres=# create index idx_tbl_2 on tbl(c1,c2,c3,c4);
- CREATE INDEX
5、收集统计信息
- postgres=# vacuum analyze tbl;
- VACUUM
6、查看下面SQL的执行计划,走了id索引
- postgres=# explain select * from tbl where c1=200 and c2=200 order by id limit 10;
- QUERY PLAN
- --------------------------------------------------------------------------------------
- Limit (cost=0.43..32.59 rows=10 width=20)
- -> Index Scan using idx_tbl_1 on tbl (cost=0.43..323244.26 rows=100533 width=20)
- Filter: ((c1 = 200) AND (c2 = 200))
- (3 rows)
还是走了id索引
- postgres=# explain select * from tbl where c1=200 and c2=200 order by id limit 1000;
- QUERY PLAN
- --------------------------------------------------------------------------------------
- Limit (cost=0.43..3215.74 rows=1000 width=20)
- -> Index Scan using idx_tbl_1 on tbl (cost=0.43..323244.26 rows=100533 width=20)
- Filter: ((c1 = 200) AND (c2 = 200))
- (3 rows)
当LIMIT达到50000时,走了c1,c2的索引。为什么呢?
- postgres=# explain select * from tbl where c1=200 and c2=200 order by id limit 50000;
- QUERY PLAN
- --------------------------------------------------------------------------------------------
- Limit (cost=70355.06..70480.06 rows=50000 width=20)
- -> Sort (cost=70355.06..70606.39 rows=100533 width=20)
- Sort Key: id
- -> Bitmap Heap Scan on tbl (cost=1457.82..62005.97 rows=100533 width=20)
- Recheck Cond: ((c1 = 200) AND (c2 = 200))
- -> Bitmap Index Scan on idx_tbl_2 (cost=0.00..1432.69 rows=100533 width=0)
- Index Cond: ((c1 = 200) AND (c2 = 200))
- (7 rows)
7、分析以上执行计划的含义
首先,表的记录数(1100万)除以"满足c1=200 and c2=200 条件的记录数"(100533),得到平均需要扫描多少条记录,可以得到一条满足c1=200 and c2=200条件的记录.
- postgres=# select 11000000/100533.0;
- ?column?
- ----------------------
- 109.4168084111684720
- (1 row)
也就是说每扫描109.4条记录,可以得到一条满足条件的记录。(优化器这么算,是认为数据分布是均匀的。)
但是,实际上,数据分布是不均匀的,c1=200 and c2=200的记录在表的末端(1000万条记录后面),也就是说需要扫描1000万条记录后,才能得到1条满足c1=200 and c2=200的记录。
并不是估算的每扫描109.4条记录,可以得到一条满足条件的记录。
问题就出在这里。
8、我们再来分析一下为什么limit 50000时,选择了c1,c2的索引。而不是id的索引
使用ID索引时,需要扫描100533条记录,同时需要排序,直到排序完成,总成约70606.39。然后就是GET HEAP TUPLE的成本。
当使用id的索引扫描时,返回多少条记录能达到70606.39的成本呢?
- 以limit 1000的3215.74成本为例
-
- postgres=# select 70606.39/3215.74;
- ?column?
- ---------------------
- 21.9564983487471002
- (1 row)
-
- postgres=# select 21.956*1000;
- ?column?
- -----------
- 21956.000
- (1 row)
经过以上分析,也就是说,LIMIT 21956时,走ID索引扫描的执行计划,成本可达到70606.39。
所以limit 21956是一个分水岭,大于这个值时,可能使用c1,c2的索引扫描,而小于它,则会使用ID索引扫描.
如下
- postgres=# explain select * from tbl where c1=200 and c2=200 order by id limit 22000;
- QUERY PLAN
- --------------------------------------------------------------------------------------------
- Limit (cost=69759.69..69814.69 rows=22000 width=20)
- -> Sort (cost=69759.69..70011.02 rows=100533 width=20)
- Sort Key: id
- -> Bitmap Heap Scan on tbl (cost=1457.82..62005.97 rows=100533 width=20)
- Recheck Cond: ((c1 = 200) AND (c2 = 200))
- -> Bitmap Index Scan on idx_tbl_2 (cost=0.00..1432.69 rows=100533 width=0)
- Index Cond: ((c1 = 200) AND (c2 = 200))
- (7 rows)
- postgres=# explain select * from tbl where c1=200 and c2=200 order by id limit 21000;
- QUERY PLAN
- --------------------------------------------------------------------------------------
- Limit (cost=0.43..67521.75 rows=21000 width=20)
- -> Index Scan using idx_tbl_1 on tbl (cost=0.43..323244.26 rows=100533 width=20)
- Filter: ((c1 = 200) AND (c2 = 200))
- (3 rows)
很显然,使用id扫描,一定会慢,因为满足条件的数据都分布在1000万行后面。
1、c1,c2索引扫描,直接命中数据,加排序(100万条),略快。
- postgres=# explain analyze select * from tbl where c1=200 and c2=200 order by id limit 22000;
- QUERY PLAN
- ----------------------------------------------------------------------------------------------------------------------------------------------
- Limit (cost=69759.69..69814.69 rows=22000 width=20) (actual time=293.961..299.054 rows=22000 loops=1)
- -> Sort (cost=69759.69..70011.02 rows=100533 width=20) (actual time=293.960..296.006 rows=22000 loops=1)
- Sort Key: id
- Sort Method: top-N heapsort Memory: 3255kB
- -> Bitmap Heap Scan on tbl (cost=1457.82..62005.97 rows=100533 width=20) (actual time=47.919..175.698 rows=1000000 loops=1)
- Recheck Cond: ((c1 = 200) AND (c2 = 200))
- Heap Blocks: exact=6370
- -> Bitmap Index Scan on idx_tbl_2 (cost=0.00..1432.69 rows=100533 width=0) (actual time=47.160..47.160 rows=1000000 loops=1)
- Index Cond: ((c1 = 200) AND (c2 = 200))
- Planning time: 0.152 ms
- Execution time: 300.664 ms
- (11 rows)
2、id 索引扫描,慢。
- postgres=# explain analyze select * from tbl where c1=200 and c2=200 order by id limit 21000;
- QUERY PLAN
- ------------------------------------------------------------------------------------------------------------------------------------------
- Limit (cost=0.43..67521.75 rows=21000 width=20) (actual time=1404.932..1412.594 rows=21000 loops=1)
- -> Index Scan using idx_tbl_1 on tbl (cost=0.43..323244.26 rows=100533 width=20) (actual time=1404.930..1409.639 rows=21000 loops=1)
- Filter: ((c1 = 200) AND (c2 = 200))
- Rows Removed by Filter: 10000000
- Planning time: 0.139 ms
- Execution time: 1414.142 ms
- (6 rows)
3、limit 10同样,id 索引扫描,慢。
- postgres=# explain ( analyze,verbose,timing,costs,buffers) select * from tbl where c1=200 and c2=200 order by id limit 10;
- QUERY PLAN
- ----------------------------------------------------------------------------------------------------------------------------------------------
- Limit (cost=0.43..32.59 rows=10 width=20) (actual time=1403.861..1403.865 rows=10 loops=1)
- Output: id, c1, c2, c3, c4
- Buffers: shared hit=91020
- -> Index Scan using idx_tbl_1 on public.tbl (cost=0.43..323244.26 rows=100533 width=20) (actual time=1403.859..1403.861 rows=10 loops=1)
- Output: id, c1, c2, c3, c4
- Filter: ((tbl.c1 = 200) AND (tbl.c2 = 200))
- Rows Removed by Filter: 10000000
- Buffers: shared hit=91020
- Planning time: 0.127 ms
- Execution time: 1403.893 ms
- (10 rows)
1、使用HINT或者改SQL,强制不走ID扫描。
并不适用于所有场景,比如数据分布均匀时,那么PG的这周成本计算方法就对口,那么什么时候使用ID,什么时候使用C1,C2索引扫描就合乎常理了。
- postgres=# explain ( analyze,verbose,timing,costs,buffers) select * from tbl where c1=200 and c2=200 order by id+0 limit 10;
- QUERY PLAN
- ----------------------------------------------------------------------------------------------------------------------------------------------
- Limit (cost=64429.79..64429.81 rows=10 width=24) (actual time=409.622..409.626 rows=10 loops=1)
- Output: id, c1, c2, c3, c4, ((id + 0))
- Buffers: shared hit=10205
- -> Sort (cost=64429.79..64681.12 rows=100533 width=24) (actual time=409.620..409.621 rows=10 loops=1)
- Output: id, c1, c2, c3, c4, ((id + 0))
- Sort Key: ((tbl.id + 0))
- Sort Method: top-N heapsort Memory: 25kB
- Buffers: shared hit=10205
- -> Bitmap Heap Scan on public.tbl (cost=1457.82..62257.30 rows=100533 width=24) (actual time=47.347..237.455 rows=1000000 loops=1)
- Output: id, c1, c2, c3, c4, (id + 0)
- Recheck Cond: ((tbl.c1 = 200) AND (tbl.c2 = 200))
- Heap Blocks: exact=6370
- Buffers: shared hit=10205
- -> Bitmap Index Scan on idx_tbl_2 (cost=0.00..1432.69 rows=100533 width=0) (actual time=46.577..46.577 rows=1000000 loops=1)
- Index Cond: ((tbl.c1 = 200) AND (tbl.c2 = 200))
- Buffers: shared hit=3835
- Planning time: 0.133 ms
- Execution time: 409.670 ms
- (18 rows)
2、如果c1,c2是等值查询的话,可以用以下索引,那么效率是最高的。
- postgres=# create index idx_tbl_3 on tbl(c1,c2,id);
- CREATE INDEX
-
-
- postgres=# explain ( analyze,verbose,timing,costs,buffers) select * from tbl where c1=200 and c2 =200 order by id limit 10;
- QUERY PLAN
- ---------------------------------------------------------------------------------------------------------------------------------------
- Limit (cost=0.56..6.93 rows=10 width=20) (actual time=0.102..0.106 rows=10 loops=1)
- Output: id, c1, c2, c3, c4
- Buffers: shared hit=1 read=4
- I/O Timings: read=0.047
- -> Index Scan using idx_tbl_3 on public.tbl (cost=0.56..64086.79 rows=100533 width=20) (actual time=0.101..0.103 rows=10 loops=1)
- Output: id, c1, c2, c3, c4
- Index Cond: ((tbl.c1 = 200) AND (tbl.c2 = 200))
- Buffers: shared hit=1 read=4
- I/O Timings: read=0.047
- Planning time: 0.142 ms
- Execution time: 0.131 ms
- (11 rows)
3、建议方法:
注意方法2 不适合非等值查询,
- postgres=# explain ( analyze,verbose,timing,costs,buffers) select * from tbl where c1=200 and c2 between 100 and 300 order by id limit 10;
- QUERY PLAN
- ----------------------------------------------------------------------------------------------------------------------------------------------
- Limit (cost=0.43..35.32 rows=10 width=20) (actual time=1371.094..1371.099 rows=10 loops=1)
- Output: id, c1, c2, c3, c4
- Buffers: shared hit=91020
- -> Index Scan using idx_tbl_1 on public.tbl (cost=0.43..350743.84 rows=100533 width=20) (actual time=1371.092..1371.095 rows=10 loops=1)
- Output: id, c1, c2, c3, c4
- Filter: ((tbl.c2 >= 100) AND (tbl.c2 <= 300) AND (tbl.c1 = 200))
- Rows Removed by Filter: 10000000
- Buffers: shared hit=91020
- Planning time: 0.278 ms
- Execution time: 1371.128 ms
- (10 rows)
但是不用担心,我们依旧可以使用其他等值查询列,加上排序列组成复合索引,在INDEX SCAN中使用FILTER来加速。
新增如下索引,加速如下
- postgres=# create index idx_tbl_4 on tbl(c1,id);
- CREATE INDEX
-
- postgres=# explain ( analyze,verbose,timing,costs,buffers) select * from tbl where c1=200 and c2 between 100 and 300 order by id limit 10;
- QUERY PLAN
- ----------------------------------------------------------------------------------------------------------------------------------------
- Limit (cost=0.43..10.47 rows=10 width=20) (actual time=0.105..0.110 rows=10 loops=1)
- Output: id, c1, c2, c3, c4
- Buffers: shared hit=1 read=3
- I/O Timings: read=0.051
- -> Index Scan using idx_tbl_4 on public.tbl (cost=0.43..100877.50 rows=100533 width=20) (actual time=0.104..0.107 rows=10 loops=1)
- Output: id, c1, c2, c3, c4
- Index Cond: (tbl.c1 = 200)
- Filter: ((tbl.c2 >= 100) AND (tbl.c2 <= 300))
- Buffers: shared hit=1 read=3
- I/O Timings: read=0.051
- Planning time: 0.172 ms
- Execution time: 0.134 ms
- (12 rows)
《Oracle migration to Greenplum - (含 Ora2pg)》
http://www.dba-oracle.com/t_OracleAutotrace.htm
- SQL> create table tbl(id int, c1 int, c2 int, c3 int, c4 int);
-
- Table created.
-
- SQL> insert into tbl select rownum,trunc(dbms_random.value(0, 100)),trunc(dbms_random.value(0, 100)),trunc(dbms_random.value(0, 100)),trunc(dbms_random.value(0, 100)) from dual connect by level <=10000000;
-
- 10000000 rows created.
-
- SQL> commit;
-
- Commit complete.
-
- SQL> insert into tbl select rownum+10000000, 200,200,200,200 from dual connect by level <=1000000;
-
- 1000000 rows created.
-
- SQL> commit;
-
- Commit complete.
-
- SQL> create index idx_tbl_1 on tbl(id);
-
- Index created.
-
- SQL> create index idx_tbl_2 on tbl(c1,c2,c3,c4);
-
- Index created.
-
- SQL> set linesize 512
- SQL> set pagesize 50000
-
- SQL> set autotrace on;
-
- SQL> exec DBMS_STATS.GATHER_TABLE_STATS('JIUDU','TBL');
-
- PL/SQL procedure successfully completed.
-
- SQL> select * from (select * from tbl where c1=200 and c2=200 order by id) t where rownum<10;
-
- ID C1 C2 C3 C4
- ---------- ---------- ---------- ---------- ----------
- 10000001 200 200 200 200
- 10000002 200 200 200 200
- 10000003 200 200 200 200
- 10000004 200 200 200 200
- 10000005 200 200 200 200
- 10000006 200 200 200 200
- 10000007 200 200 200 200
- 10000008 200 200 200 200
- 10000009 200 200 200 200
-
- 9 rows selected.
-
-
- Execution Plan
- ----------------------------------------------------------
- Plan hash value: 745043579
-
- ----------------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
- ----------------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 9 | 585 | | 10253 (2)| 00:02:04 |
- |* 1 | COUNT STOPKEY | | | | | | |
- | 2 | VIEW | | 84875 | 5387K| | 10253 (2)| 00:02:04 |
- |* 3 | SORT ORDER BY STOPKEY| | 84875 | 1491K| 2672K| 10253 (2)| 00:02:04 |
- |* 4 | TABLE ACCESS FULL | TBL | 84875 | 1491K| | 9767 (2)| 00:01:58 |
- ----------------------------------------------------------------------------------------
-
- Predicate Information (identified by operation id):
- ---------------------------------------------------
-
- 1 - filter(ROWNUM<10)
- 3 - filter(ROWNUM<10)
- 4 - filter("C1"=200 AND "C2"=200)
-
-
- Statistics
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 34868 consistent gets
- 0 physical reads
- 0 redo size
- 937 bytes sent via SQL*Net to client
- 500 bytes received via SQL*Net from client
- 2 SQL*Net roundtrips to/from client
- 1 sorts (memory)
- 0 sorts (disk)
- 9 rows processed
-
- SQL> select * from (select * from tbl where c1=1 and c2=1 order by id) t where rownum<10;
-
- ID C1 C2 C3 C4
- ---------- ---------- ---------- ---------- ----------
- 9697 1 1 78 39
- 20586 1 1 81 71
- 27820 1 1 33 64
- 44324 1 1 26 27
- 47079 1 1 3 5
- 64669 1 1 13 49
- 73715 1 1 20 74
- 80903 1 1 96 25
- 98368 1 1 59 9
-
- 9 rows selected.
-
-
- Execution Plan
- ----------------------------------------------------------
- Plan hash value: 447312937
-
- --------------------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- --------------------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 9 | 585 | 641 (1)| 00:00:08 |
- |* 1 | COUNT STOPKEY | | | | | |
- | 2 | VIEW | | 704 | 45760 | 641 (1)| 00:00:08 |
- |* 3 | SORT ORDER BY STOPKEY | | 704 | 12672 | 641 (1)| 00:00:08 |
- | 4 | TABLE ACCESS BY INDEX ROWID| TBL | 704 | 12672 | 640 (0)| 00:00:08 |
- |* 5 | INDEX RANGE SCAN | IDX_TBL_2 | 704 | | 5 (0)| 00:00:01 |
- --------------------------------------------------------------------------------------------
-
- Predicate Information (identified by operation id):
- ---------------------------------------------------
-
- 1 - filter(ROWNUM<10)
- 3 - filter(ROWNUM<10)
- 5 - access("C1"=1 AND "C2"=1)
-
-
- Statistics
- ----------------------------------------------------------
- 1 recursive calls
- 0 db block gets
- 1072 consistent gets
- 11 physical reads
- 0 redo size
- 969 bytes sent via SQL*Net to client
- 500 bytes received via SQL*Net from client
- 2 SQL*Net roundtrips to/from client
- 1 sorts (memory)
- 0 sorts (disk)
- 9 rows processed
-
-
- SQL> create index idx_tbl_3 on tbl(c1,c2,id);
- Index created.
-
- SQL> select * from (select * from tbl where c1=200 and c2 between 100 and 300 order by id) t where rownum < 10;
-
- ID C1 C2 C3 C4
- ---------- ---------- ---------- ---------- ----------
- 10000001 200 200 200 200
- 10000002 200 200 200 200
- 10000003 200 200 200 200
- 10000004 200 200 200 200
- 10000005 200 200 200 200
- 10000006 200 200 200 200
- 10000007 200 200 200 200
- 10000008 200 200 200 200
- 10000009 200 200 200 200
-
- 9 rows selected.
-
-
- Execution Plan
- ----------------------------------------------------------
- Plan hash value: 745043579
-
- ----------------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
- ----------------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 9 | 585 | | 10253 (2)| 00:02:04 |
- |* 1 | COUNT STOPKEY | | | | | | |
- | 2 | VIEW | | 84875 | 5387K| | 10253 (2)| 00:02:04 |
- |* 3 | SORT ORDER BY STOPKEY| | 84875 | 1491K| 2672K| 10253 (2)| 00:02:04 |
- |* 4 | TABLE ACCESS FULL | TBL | 84875 | 1491K| | 9767 (2)| 00:01:58 |
- ----------------------------------------------------------------------------------------
-
- Predicate Information (identified by operation id):
- ---------------------------------------------------
-
- 1 - filter(ROWNUM<10)
- 3 - filter(ROWNUM<10)
- 4 - filter("C1"=200 AND "C2">=100 AND "C2"<=300)
-
-
- Statistics
- ----------------------------------------------------------
- 1 recursive calls
- 0 db block gets
- 34868 consistent gets
- 0 physical reads
- 0 redo size
- 937 bytes sent via SQL*Net to client
- 500 bytes received via SQL*Net from client
- 2 SQL*Net roundtrips to/from client
- 1 sorts (memory)
- 0 sorts (disk)
- 9 rows processed
-
- SQL> select * from (select * from tbl where c1=200 and c2 =200 order by id) t where rownum < 10;
-
- ID C1 C2 C3 C4
- ---------- ---------- ---------- ---------- ----------
- 10000001 200 200 200 200
- 10000002 200 200 200 200
- 10000003 200 200 200 200
- 10000004 200 200 200 200
- 10000005 200 200 200 200
- 10000006 200 200 200 200
- 10000007 200 200 200 200
- 10000008 200 200 200 200
- 10000009 200 200 200 200
-
- 9 rows selected.
-
-
- Execution Plan
- ----------------------------------------------------------
- Plan hash value: 1825274432
-
- -------------------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- -------------------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 9 | 585 | 12 (0)| 00:00:01 |
- |* 1 | COUNT STOPKEY | | | | | |
- | 2 | VIEW | | 10 | 650 | 12 (0)| 00:00:01 |
- | 3 | TABLE ACCESS BY INDEX ROWID| TBL | 10 | 180 | 12 (0)| 00:00:01 |
- |* 4 | INDEX RANGE SCAN | IDX_TBL_3 | | | 3 (0)| 00:00:01 |
- -------------------------------------------------------------------------------------------
-
- Predicate Information (identified by operation id):
- ---------------------------------------------------
-
- 1 - filter(ROWNUM<10)
- 4 - access("C1"=200 AND "C2"=200)
-
-
- Statistics
- ----------------------------------------------------------
- 1 recursive calls
- 0 db block gets
- 6 consistent gets
- 2 physical reads
- 0 redo size
- 937 bytes sent via SQL*Net to client
- 500 bytes received via SQL*Net from client
- 2 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 9 rows processed
-
同时也发现一个问题,Oracle可能无法使用index filter来优化,例如将index2,index3删除后,留下ID索引,Oracle无法走索引,而PG可以。
- SQL> select * from (select * from tbl where c1=1 and c2 =1 order by id) t where rownum < 10;
-
- ID C1 C2 C3 C4
- ---------- ---------- ---------- ---------- ----------
- 9697 1 1 78 39
- 20586 1 1 81 71
- 27820 1 1 33 64
- 44324 1 1 26 27
- 47079 1 1 3 5
- 64669 1 1 13 49
- 73715 1 1 20 74
- 80903 1 1 96 25
- 98368 1 1 59 9
-
- 9 rows selected.
-
-
- Execution Plan
- ----------------------------------------------------------
- Plan hash value: 745043579
-
- --------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- --------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 9 | 585 | 9766 (2)| 00:01:58 |
- |* 1 | COUNT STOPKEY | | | | | |
- | 2 | VIEW | | 704 | 45760 | 9766 (2)| 00:01:58 |
- |* 3 | SORT ORDER BY STOPKEY| | 704 | 12672 | 9766 (2)| 00:01:58 |
- |* 4 | TABLE ACCESS FULL | TBL | 704 | 12672 | 9765 (2)| 00:01:58 |
- --------------------------------------------------------------------------------
-
- Predicate Information (identified by operation id):
- ---------------------------------------------------
-
- 1 - filter(ROWNUM<10)
- 3 - filter(ROWNUM<10)
- 4 - filter("C1"=1 AND "C2"=1)
-
-
- Statistics
- ----------------------------------------------------------
- 186 recursive calls
- 0 db block gets
- 34893 consistent gets
- 0 physical reads
- 0 redo size
- 969 bytes sent via SQL*Net to client
- 500 bytes received via SQL*Net from client
- 2 SQL*Net roundtrips to/from client
- 7 sorts (memory)
- 0 sorts (disk)
- 9 rows processed
-
-
- PG
-
-
- postgres=# explain analyze select * from tbl where c1=1 and c2 =1 order by id limit 10;
- QUERY PLAN
- -------------------------------------------------------------------------------------------------------------------------------
- Limit (cost=0.43..3703.11 rows=10 width=20) (actual time=7.199..23.926 rows=10 loops=1)
- -> Index Scan using idx_tbl_1 on tbl (cost=0.43..323243.84 rows=873 width=20) (actual time=7.198..23.921 rows=10 loops=1)
- Filter: ((c1 = 1) AND (c2 = 1))
- Rows Removed by Filter: 142814
- Planning time: 0.119 ms
- Execution time: 23.950 ms
- (6 rows)
当SQL查询中包括排序,以及其他字段的过滤条件,并使用LIMIT快速返回少量数据时,如果满足条件的数据分布在排序键的末端,那么优化器给出的执行计划可能是不好的,导致通过排序索引扫描更多的数据后才能命中需要的记录。
然而,数据库目前使用的评估走排序键时,LIMIT需要扫描多少条记录,使用了数据均匀分布的假设,所以在数据(满足条件的数据与排序键本身的相关性不均匀)分布不均匀时,导致成本估算不准(oracle干脆走全表扫描)。
建议优化方法:
增加索引,创建等值查询条件列(s)加排序列(s)组成的复合索引,降低扫描量。
例子
- select * from tbl where c1=200 and c2 between 100 and 300 order by id limit 10;
-
- 增加索引
-
- (c1,id) -- 索引扫描, filter c2
-
- 已有
- (c1,c2) -- 索引扫描, sort id
- (id) -- 索引扫描, filter c1,c2
- select * from tbl where c1=200 and c2 =200 order by id limit 10;
-
- 增加索引
-
- (c1,c2,id) -- 索引扫描
-
- 已有
- (c1,c2) -- 索引扫描, sort id
- (id) -- 索引扫描, filter c1,c2
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。