当前位置:   article > 正文

PostgreSQL 优化器案例之 - order by limit 索引选择问题

postgres order by 优化

标签

PostgreSQL , limit , order by , 优化器 , 选择性 , 相关性 , 数据存储顺序 , 目标数据存储顺序


背景

当我们在执行一个这样的SQL时,假如有这样几个索引(c1,c2) (id),数据库到底该用哪个索引呢?

  1. explain select * from tbl where c1=200 and c2=200 order by id limit 10;
  2. explain select * from tbl where c1=200 and c2 between 100 and 300 order by id limit 10;

实际上PG会通过计算成本得到应该使用哪个索引。

但是实际上计算公式本身的通用性,使得当C1,C2数据分布倾斜时,可能导致某些C1,C2输入值的执行计划不准确。

走c1,c2索引,当满足c1,c2条件的记录很少时。如果数据比较多,则SORT耗费的成本就较大,导致整个耗时变大。

走id索引,当满足c1,c2条件的记录很多时,(并且c1,c2满足条件的数据分布在id偏小的端),如果分布在大端,则需要扫描更多的记录才能找到满足条件的记录。

如果数据库统计信息中可以评估出来满足条件的c1,c2的记录与ID顺序分布的关系,也可以作为优化器计算成本的一种参考输入(当然要做到这个可能比较复杂,同时也会增加优化器计算成本的开销)。

下面通过实例来说明。

例子

1、建测试表

  1. postgres=# create table tbl (id int, c1 int, c2 int, c3 int, c4 int);
  2. CREATE TABLE

2、写入一批随机数据,ID从1到1000万。

  1. postgres=# insert into tbl select generate_series(1,10000000), random()*100, random()*100, random()*100, random()*100;
  2. INSERT 0 10000000

3、写入另一批100万条数据,c1,c2 与前面1000万的值不一样。

  1. postgres=# insert into tbl select generate_series(10000001,11000000), 200,200,200,200;
  2. INSERT 0 1000000

4、创建两个索引,也就是本文需要重点关注的,到底走哪个索引更划算

  1. postgres=# create index idx_tbl_1 on tbl(id);
  2. CREATE INDEX
  3. postgres=# create index idx_tbl_2 on tbl(c1,c2,c3,c4);
  4. CREATE INDEX

5、收集统计信息

  1. postgres=# vacuum analyze tbl;
  2. VACUUM

6、查看下面SQL的执行计划,走了id索引

  1. postgres=# explain select * from tbl where c1=200 and c2=200 order by id limit 10;
  2. QUERY PLAN
  3. --------------------------------------------------------------------------------------
  4. Limit (cost=0.43..32.59 rows=10 width=20)
  5. -> Index Scan using idx_tbl_1 on tbl (cost=0.43..323244.26 rows=100533 width=20)
  6. Filter: ((c1 = 200) AND (c2 = 200))
  7. (3 rows)

还是走了id索引

  1. postgres=# explain select * from tbl where c1=200 and c2=200 order by id limit 1000;
  2. QUERY PLAN
  3. --------------------------------------------------------------------------------------
  4. Limit (cost=0.43..3215.74 rows=1000 width=20)
  5. -> Index Scan using idx_tbl_1 on tbl (cost=0.43..323244.26 rows=100533 width=20)
  6. Filter: ((c1 = 200) AND (c2 = 200))
  7. (3 rows)

当LIMIT达到50000时,走了c1,c2的索引。为什么呢?

  1. postgres=# explain select * from tbl where c1=200 and c2=200 order by id limit 50000;
  2. QUERY PLAN
  3. --------------------------------------------------------------------------------------------
  4. Limit (cost=70355.06..70480.06 rows=50000 width=20)
  5. -> Sort (cost=70355.06..70606.39 rows=100533 width=20)
  6. Sort Key: id
  7. -> Bitmap Heap Scan on tbl (cost=1457.82..62005.97 rows=100533 width=20)
  8. Recheck Cond: ((c1 = 200) AND (c2 = 200))
  9. -> Bitmap Index Scan on idx_tbl_2 (cost=0.00..1432.69 rows=100533 width=0)
  10. Index Cond: ((c1 = 200) AND (c2 = 200))
  11. (7 rows)

7、分析以上执行计划的含义

首先,表的记录数(1100万)除以"满足c1=200 and c2=200 条件的记录数"(100533),得到平均需要扫描多少条记录,可以得到一条满足c1=200 and c2=200条件的记录.

  1. postgres=# select 11000000/100533.0;
  2. ?column?
  3. ----------------------
  4. 109.4168084111684720
  5. (1 row)

也就是说每扫描109.4条记录,可以得到一条满足条件的记录。(优化器这么算,是认为数据分布是均匀的。)

但是,实际上,数据分布是不均匀的,c1=200 and c2=200的记录在表的末端(1000万条记录后面),也就是说需要扫描1000万条记录后,才能得到1条满足c1=200 and c2=200的记录。

并不是估算的每扫描109.4条记录,可以得到一条满足条件的记录。

问题就出在这里。

8、我们再来分析一下为什么limit 50000时,选择了c1,c2的索引。而不是id的索引

使用ID索引时,需要扫描100533条记录,同时需要排序,直到排序完成,总成约70606.39。然后就是GET HEAP TUPLE的成本。

当使用id的索引扫描时,返回多少条记录能达到70606.39的成本呢?

  1. limit 10003215.74成本为例
  2. postgres=# select 70606.39/3215.74;
  3. ?column?
  4. ---------------------
  5. 21.9564983487471002
  6. (1 row)
  7. postgres=# select 21.956*1000;
  8. ?column?
  9. -----------
  10. 21956.000
  11. (1 row)

分水岭

经过以上分析,也就是说,LIMIT 21956时,走ID索引扫描的执行计划,成本可达到70606.39。

所以limit 21956是一个分水岭,大于这个值时,可能使用c1,c2的索引扫描,而小于它,则会使用ID索引扫描.

如下

  1. postgres=# explain select * from tbl where c1=200 and c2=200 order by id limit 22000;
  2. QUERY PLAN
  3. --------------------------------------------------------------------------------------------
  4. Limit (cost=69759.69..69814.69 rows=22000 width=20)
  5. -> Sort (cost=69759.69..70011.02 rows=100533 width=20)
  6. Sort Key: id
  7. -> Bitmap Heap Scan on tbl (cost=1457.82..62005.97 rows=100533 width=20)
  8. Recheck Cond: ((c1 = 200) AND (c2 = 200))
  9. -> Bitmap Index Scan on idx_tbl_2 (cost=0.00..1432.69 rows=100533 width=0)
  10. Index Cond: ((c1 = 200) AND (c2 = 200))
  11. (7 rows)
  1. postgres=# explain select * from tbl where c1=200 and c2=200 order by id limit 21000;
  2. QUERY PLAN
  3. --------------------------------------------------------------------------------------
  4. Limit (cost=0.43..67521.75 rows=21000 width=20)
  5. -> Index Scan using idx_tbl_1 on tbl (cost=0.43..323244.26 rows=100533 width=20)
  6. Filter: ((c1 = 200) AND (c2 = 200))
  7. (3 rows)

真实的执行耗时

很显然,使用id扫描,一定会慢,因为满足条件的数据都分布在1000万行后面。

1、c1,c2索引扫描,直接命中数据,加排序(100万条),略快。

  1. postgres=# explain analyze select * from tbl where c1=200 and c2=200 order by id limit 22000;
  2. QUERY PLAN
  3. ----------------------------------------------------------------------------------------------------------------------------------------------
  4. Limit (cost=69759.69..69814.69 rows=22000 width=20) (actual time=293.961..299.054 rows=22000 loops=1)
  5. -> Sort (cost=69759.69..70011.02 rows=100533 width=20) (actual time=293.960..296.006 rows=22000 loops=1)
  6. Sort Key: id
  7. Sort Method: top-N heapsort Memory: 3255kB
  8. -> Bitmap Heap Scan on tbl (cost=1457.82..62005.97 rows=100533 width=20) (actual time=47.919..175.698 rows=1000000 loops=1)
  9. Recheck Cond: ((c1 = 200) AND (c2 = 200))
  10. Heap Blocks: exact=6370
  11. -> Bitmap Index Scan on idx_tbl_2 (cost=0.00..1432.69 rows=100533 width=0) (actual time=47.160..47.160 rows=1000000 loops=1)
  12. Index Cond: ((c1 = 200) AND (c2 = 200))
  13. Planning time: 0.152 ms
  14. Execution time: 300.664 ms
  15. (11 rows)

2、id 索引扫描,慢。

  1. postgres=# explain analyze select * from tbl where c1=200 and c2=200 order by id limit 21000;
  2. QUERY PLAN
  3. ------------------------------------------------------------------------------------------------------------------------------------------
  4. Limit (cost=0.43..67521.75 rows=21000 width=20) (actual time=1404.932..1412.594 rows=21000 loops=1)
  5. -> Index Scan using idx_tbl_1 on tbl (cost=0.43..323244.26 rows=100533 width=20) (actual time=1404.930..1409.639 rows=21000 loops=1)
  6. Filter: ((c1 = 200) AND (c2 = 200))
  7. Rows Removed by Filter: 10000000
  8. Planning time: 0.139 ms
  9. Execution time: 1414.142 ms
  10. (6 rows)

3、limit 10同样,id 索引扫描,慢。

  1. postgres=# explain ( analyze,verbose,timing,costs,buffers) select * from tbl where c1=200 and c2=200 order by id limit 10;
  2. QUERY PLAN
  3. ----------------------------------------------------------------------------------------------------------------------------------------------
  4. Limit (cost=0.43..32.59 rows=10 width=20) (actual time=1403.861..1403.865 rows=10 loops=1)
  5. Output: id, c1, c2, c3, c4
  6. Buffers: shared hit=91020
  7. -> Index Scan using idx_tbl_1 on public.tbl (cost=0.43..323244.26 rows=100533 width=20) (actual time=1403.859..1403.861 rows=10 loops=1)
  8. Output: id, c1, c2, c3, c4
  9. Filter: ((tbl.c1 = 200) AND (tbl.c2 = 200))
  10. Rows Removed by Filter: 10000000
  11. Buffers: shared hit=91020
  12. Planning time: 0.127 ms
  13. Execution time: 1403.893 ms
  14. (10 rows)

优化方法

1、使用HINT或者改SQL,强制不走ID扫描。

并不适用于所有场景,比如数据分布均匀时,那么PG的这周成本计算方法就对口,那么什么时候使用ID,什么时候使用C1,C2索引扫描就合乎常理了。

  1. postgres=# explain ( analyze,verbose,timing,costs,buffers) select * from tbl where c1=200 and c2=200 order by id+0 limit 10;
  2. QUERY PLAN
  3. ----------------------------------------------------------------------------------------------------------------------------------------------
  4. Limit (cost=64429.79..64429.81 rows=10 width=24) (actual time=409.622..409.626 rows=10 loops=1)
  5. Output: id, c1, c2, c3, c4, ((id + 0))
  6. Buffers: shared hit=10205
  7. -> Sort (cost=64429.79..64681.12 rows=100533 width=24) (actual time=409.620..409.621 rows=10 loops=1)
  8. Output: id, c1, c2, c3, c4, ((id + 0))
  9. Sort Key: ((tbl.id + 0))
  10. Sort Method: top-N heapsort Memory: 25kB
  11. Buffers: shared hit=10205
  12. -> Bitmap Heap Scan on public.tbl (cost=1457.82..62257.30 rows=100533 width=24) (actual time=47.347..237.455 rows=1000000 loops=1)
  13. Output: id, c1, c2, c3, c4, (id + 0)
  14. Recheck Cond: ((tbl.c1 = 200) AND (tbl.c2 = 200))
  15. Heap Blocks: exact=6370
  16. Buffers: shared hit=10205
  17. -> Bitmap Index Scan on idx_tbl_2 (cost=0.00..1432.69 rows=100533 width=0) (actual time=46.577..46.577 rows=1000000 loops=1)
  18. Index Cond: ((tbl.c1 = 200) AND (tbl.c2 = 200))
  19. Buffers: shared hit=3835
  20. Planning time: 0.133 ms
  21. Execution time: 409.670 ms
  22. (18 rows)

2、如果c1,c2是等值查询的话,可以用以下索引,那么效率是最高的。

  1. postgres=# create index idx_tbl_3 on tbl(c1,c2,id);
  2. CREATE INDEX
  3. postgres=# explain ( analyze,verbose,timing,costs,buffers) select * from tbl where c1=200 and c2 =200 order by id limit 10;
  4. QUERY PLAN
  5. ---------------------------------------------------------------------------------------------------------------------------------------
  6. Limit (cost=0.56..6.93 rows=10 width=20) (actual time=0.102..0.106 rows=10 loops=1)
  7. Output: id, c1, c2, c3, c4
  8. Buffers: shared hit=1 read=4
  9. I/O Timings: read=0.047
  10. -> Index Scan using idx_tbl_3 on public.tbl (cost=0.56..64086.79 rows=100533 width=20) (actual time=0.101..0.103 rows=10 loops=1)
  11. Output: id, c1, c2, c3, c4
  12. Index Cond: ((tbl.c1 = 200) AND (tbl.c2 = 200))
  13. Buffers: shared hit=1 read=4
  14. I/O Timings: read=0.047
  15. Planning time: 0.142 ms
  16. Execution time: 0.131 ms
  17. (11 rows)

3、建议方法:

注意方法2 不适合非等值查询,

  1. postgres=# explain ( analyze,verbose,timing,costs,buffers) select * from tbl where c1=200 and c2 between 100 and 300 order by id limit 10;
  2. QUERY PLAN
  3. ----------------------------------------------------------------------------------------------------------------------------------------------
  4. Limit (cost=0.43..35.32 rows=10 width=20) (actual time=1371.094..1371.099 rows=10 loops=1)
  5. Output: id, c1, c2, c3, c4
  6. Buffers: shared hit=91020
  7. -> Index Scan using idx_tbl_1 on public.tbl (cost=0.43..350743.84 rows=100533 width=20) (actual time=1371.092..1371.095 rows=10 loops=1)
  8. Output: id, c1, c2, c3, c4
  9. Filter: ((tbl.c2 >= 100) AND (tbl.c2 <= 300) AND (tbl.c1 = 200))
  10. Rows Removed by Filter: 10000000
  11. Buffers: shared hit=91020
  12. Planning time: 0.278 ms
  13. Execution time: 1371.128 ms
  14. (10 rows)

但是不用担心,我们依旧可以使用其他等值查询列,加上排序列组成复合索引,在INDEX SCAN中使用FILTER来加速。

新增如下索引,加速如下

  1. postgres=# create index idx_tbl_4 on tbl(c1,id);
  2. CREATE INDEX
  3. postgres=# explain ( analyze,verbose,timing,costs,buffers) select * from tbl where c1=200 and c2 between 100 and 300 order by id limit 10;
  4. QUERY PLAN
  5. ----------------------------------------------------------------------------------------------------------------------------------------
  6. Limit (cost=0.43..10.47 rows=10 width=20) (actual time=0.105..0.110 rows=10 loops=1)
  7. Output: id, c1, c2, c3, c4
  8. Buffers: shared hit=1 read=3
  9. I/O Timings: read=0.051
  10. -> Index Scan using idx_tbl_4 on public.tbl (cost=0.43..100877.50 rows=100533 width=20) (actual time=0.104..0.107 rows=10 loops=1)
  11. Output: id, c1, c2, c3, c4
  12. Index Cond: (tbl.c1 = 200)
  13. Filter: ((tbl.c2 >= 100) AND (tbl.c2 <= 300))
  14. Buffers: shared hit=1 read=3
  15. I/O Timings: read=0.051
  16. Planning time: 0.172 ms
  17. Execution time: 0.134 ms
  18. (12 rows)

附Oracle

《Oracle migration to Greenplum - (含 Ora2pg)》

http://www.dba-oracle.com/t_OracleAutotrace.htm

  1. SQL> create table tbl(id int, c1 int, c2 int, c3 int, c4 int);
  2. Table created.
  3. SQL> insert into tbl select rownum,trunc(dbms_random.value(0, 100)),trunc(dbms_random.value(0, 100)),trunc(dbms_random.value(0, 100)),trunc(dbms_random.value(0, 100)) from dual connect by level <=10000000;
  4. 10000000 rows created.
  5. SQL> commit;
  6. Commit complete.
  7. SQL> insert into tbl select rownum+10000000, 200,200,200,200 from dual connect by level <=1000000;
  8. 1000000 rows created.
  9. SQL> commit;
  10. Commit complete.
  11. SQL> create index idx_tbl_1 on tbl(id);
  12. Index created.
  13. SQL> create index idx_tbl_2 on tbl(c1,c2,c3,c4);
  14. Index created.
  15. SQL> set linesize 512
  16. SQL> set pagesize 50000
  17. SQL> set autotrace on;
  18. SQL> exec DBMS_STATS.GATHER_TABLE_STATS('JIUDU','TBL');
  19. PL/SQL procedure successfully completed.
  20. SQL> select * from (select * from tbl where c1=200 and c2=200 order by id) t where rownum<10;
  21. ID C1 C2 C3 C4
  22. ---------- ---------- ---------- ---------- ----------
  23. 10000001 200 200 200 200
  24. 10000002 200 200 200 200
  25. 10000003 200 200 200 200
  26. 10000004 200 200 200 200
  27. 10000005 200 200 200 200
  28. 10000006 200 200 200 200
  29. 10000007 200 200 200 200
  30. 10000008 200 200 200 200
  31. 10000009 200 200 200 200
  32. 9 rows selected.
  33. Execution Plan
  34. ----------------------------------------------------------
  35. Plan hash value: 745043579
  36. ----------------------------------------------------------------------------------------
  37. | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
  38. ----------------------------------------------------------------------------------------
  39. | 0 | SELECT STATEMENT | | 9 | 585 | | 10253 (2)| 00:02:04 |
  40. |* 1 | COUNT STOPKEY | | | | | | |
  41. | 2 | VIEW | | 84875 | 5387K| | 10253 (2)| 00:02:04 |
  42. |* 3 | SORT ORDER BY STOPKEY| | 84875 | 1491K| 2672K| 10253 (2)| 00:02:04 |
  43. |* 4 | TABLE ACCESS FULL | TBL | 84875 | 1491K| | 9767 (2)| 00:01:58 |
  44. ----------------------------------------------------------------------------------------
  45. Predicate Information (identified by operation id):
  46. ---------------------------------------------------
  47. 1 - filter(ROWNUM<10)
  48. 3 - filter(ROWNUM<10)
  49. 4 - filter("C1"=200 AND "C2"=200)
  50. Statistics
  51. ----------------------------------------------------------
  52. 0 recursive calls
  53. 0 db block gets
  54. 34868 consistent gets
  55. 0 physical reads
  56. 0 redo size
  57. 937 bytes sent via SQL*Net to client
  58. 500 bytes received via SQL*Net from client
  59. 2 SQL*Net roundtrips to/from client
  60. 1 sorts (memory)
  61. 0 sorts (disk)
  62. 9 rows processed
  63. SQL> select * from (select * from tbl where c1=1 and c2=1 order by id) t where rownum<10;
  64. ID C1 C2 C3 C4
  65. ---------- ---------- ---------- ---------- ----------
  66. 9697 1 1 78 39
  67. 20586 1 1 81 71
  68. 27820 1 1 33 64
  69. 44324 1 1 26 27
  70. 47079 1 1 3 5
  71. 64669 1 1 13 49
  72. 73715 1 1 20 74
  73. 80903 1 1 96 25
  74. 98368 1 1 59 9
  75. 9 rows selected.
  76. Execution Plan
  77. ----------------------------------------------------------
  78. Plan hash value: 447312937
  79. --------------------------------------------------------------------------------------------
  80. | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
  81. --------------------------------------------------------------------------------------------
  82. | 0 | SELECT STATEMENT | | 9 | 585 | 641 (1)| 00:00:08 |
  83. |* 1 | COUNT STOPKEY | | | | | |
  84. | 2 | VIEW | | 704 | 45760 | 641 (1)| 00:00:08 |
  85. |* 3 | SORT ORDER BY STOPKEY | | 704 | 12672 | 641 (1)| 00:00:08 |
  86. | 4 | TABLE ACCESS BY INDEX ROWID| TBL | 704 | 12672 | 640 (0)| 00:00:08 |
  87. |* 5 | INDEX RANGE SCAN | IDX_TBL_2 | 704 | | 5 (0)| 00:00:01 |
  88. --------------------------------------------------------------------------------------------
  89. Predicate Information (identified by operation id):
  90. ---------------------------------------------------
  91. 1 - filter(ROWNUM<10)
  92. 3 - filter(ROWNUM<10)
  93. 5 - access("C1"=1 AND "C2"=1)
  94. Statistics
  95. ----------------------------------------------------------
  96. 1 recursive calls
  97. 0 db block gets
  98. 1072 consistent gets
  99. 11 physical reads
  100. 0 redo size
  101. 969 bytes sent via SQL*Net to client
  102. 500 bytes received via SQL*Net from client
  103. 2 SQL*Net roundtrips to/from client
  104. 1 sorts (memory)
  105. 0 sorts (disk)
  106. 9 rows processed
  107. SQL> create index idx_tbl_3 on tbl(c1,c2,id);
  108. Index created.
  109. SQL> select * from (select * from tbl where c1=200 and c2 between 100 and 300 order by id) t where rownum < 10;
  110. ID C1 C2 C3 C4
  111. ---------- ---------- ---------- ---------- ----------
  112. 10000001 200 200 200 200
  113. 10000002 200 200 200 200
  114. 10000003 200 200 200 200
  115. 10000004 200 200 200 200
  116. 10000005 200 200 200 200
  117. 10000006 200 200 200 200
  118. 10000007 200 200 200 200
  119. 10000008 200 200 200 200
  120. 10000009 200 200 200 200
  121. 9 rows selected.
  122. Execution Plan
  123. ----------------------------------------------------------
  124. Plan hash value: 745043579
  125. ----------------------------------------------------------------------------------------
  126. | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
  127. ----------------------------------------------------------------------------------------
  128. | 0 | SELECT STATEMENT | | 9 | 585 | | 10253 (2)| 00:02:04 |
  129. |* 1 | COUNT STOPKEY | | | | | | |
  130. | 2 | VIEW | | 84875 | 5387K| | 10253 (2)| 00:02:04 |
  131. |* 3 | SORT ORDER BY STOPKEY| | 84875 | 1491K| 2672K| 10253 (2)| 00:02:04 |
  132. |* 4 | TABLE ACCESS FULL | TBL | 84875 | 1491K| | 9767 (2)| 00:01:58 |
  133. ----------------------------------------------------------------------------------------
  134. Predicate Information (identified by operation id):
  135. ---------------------------------------------------
  136. 1 - filter(ROWNUM<10)
  137. 3 - filter(ROWNUM<10)
  138. 4 - filter("C1"=200 AND "C2">=100 AND "C2"<=300)
  139. Statistics
  140. ----------------------------------------------------------
  141. 1 recursive calls
  142. 0 db block gets
  143. 34868 consistent gets
  144. 0 physical reads
  145. 0 redo size
  146. 937 bytes sent via SQL*Net to client
  147. 500 bytes received via SQL*Net from client
  148. 2 SQL*Net roundtrips to/from client
  149. 1 sorts (memory)
  150. 0 sorts (disk)
  151. 9 rows processed
  152. SQL> select * from (select * from tbl where c1=200 and c2 =200 order by id) t where rownum < 10;
  153. ID C1 C2 C3 C4
  154. ---------- ---------- ---------- ---------- ----------
  155. 10000001 200 200 200 200
  156. 10000002 200 200 200 200
  157. 10000003 200 200 200 200
  158. 10000004 200 200 200 200
  159. 10000005 200 200 200 200
  160. 10000006 200 200 200 200
  161. 10000007 200 200 200 200
  162. 10000008 200 200 200 200
  163. 10000009 200 200 200 200
  164. 9 rows selected.
  165. Execution Plan
  166. ----------------------------------------------------------
  167. Plan hash value: 1825274432
  168. -------------------------------------------------------------------------------------------
  169. | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
  170. -------------------------------------------------------------------------------------------
  171. | 0 | SELECT STATEMENT | | 9 | 585 | 12 (0)| 00:00:01 |
  172. |* 1 | COUNT STOPKEY | | | | | |
  173. | 2 | VIEW | | 10 | 650 | 12 (0)| 00:00:01 |
  174. | 3 | TABLE ACCESS BY INDEX ROWID| TBL | 10 | 180 | 12 (0)| 00:00:01 |
  175. |* 4 | INDEX RANGE SCAN | IDX_TBL_3 | | | 3 (0)| 00:00:01 |
  176. -------------------------------------------------------------------------------------------
  177. Predicate Information (identified by operation id):
  178. ---------------------------------------------------
  179. 1 - filter(ROWNUM<10)
  180. 4 - access("C1"=200 AND "C2"=200)
  181. Statistics
  182. ----------------------------------------------------------
  183. 1 recursive calls
  184. 0 db block gets
  185. 6 consistent gets
  186. 2 physical reads
  187. 0 redo size
  188. 937 bytes sent via SQL*Net to client
  189. 500 bytes received via SQL*Net from client
  190. 2 SQL*Net roundtrips to/from client
  191. 0 sorts (memory)
  192. 0 sorts (disk)
  193. 9 rows processed

同时也发现一个问题,Oracle可能无法使用index filter来优化,例如将index2,index3删除后,留下ID索引,Oracle无法走索引,而PG可以。

  1. SQL> select * from (select * from tbl where c1=1 and c2 =1 order by id) t where rownum < 10;
  2. ID C1 C2 C3 C4
  3. ---------- ---------- ---------- ---------- ----------
  4. 9697 1 1 78 39
  5. 20586 1 1 81 71
  6. 27820 1 1 33 64
  7. 44324 1 1 26 27
  8. 47079 1 1 3 5
  9. 64669 1 1 13 49
  10. 73715 1 1 20 74
  11. 80903 1 1 96 25
  12. 98368 1 1 59 9
  13. 9 rows selected.
  14. Execution Plan
  15. ----------------------------------------------------------
  16. Plan hash value: 745043579
  17. --------------------------------------------------------------------------------
  18. | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
  19. --------------------------------------------------------------------------------
  20. | 0 | SELECT STATEMENT | | 9 | 585 | 9766 (2)| 00:01:58 |
  21. |* 1 | COUNT STOPKEY | | | | | |
  22. | 2 | VIEW | | 704 | 45760 | 9766 (2)| 00:01:58 |
  23. |* 3 | SORT ORDER BY STOPKEY| | 704 | 12672 | 9766 (2)| 00:01:58 |
  24. |* 4 | TABLE ACCESS FULL | TBL | 704 | 12672 | 9765 (2)| 00:01:58 |
  25. --------------------------------------------------------------------------------
  26. Predicate Information (identified by operation id):
  27. ---------------------------------------------------
  28. 1 - filter(ROWNUM<10)
  29. 3 - filter(ROWNUM<10)
  30. 4 - filter("C1"=1 AND "C2"=1)
  31. Statistics
  32. ----------------------------------------------------------
  33. 186 recursive calls
  34. 0 db block gets
  35. 34893 consistent gets
  36. 0 physical reads
  37. 0 redo size
  38. 969 bytes sent via SQL*Net to client
  39. 500 bytes received via SQL*Net from client
  40. 2 SQL*Net roundtrips to/from client
  41. 7 sorts (memory)
  42. 0 sorts (disk)
  43. 9 rows processed
  44. PG
  45. postgres=# explain analyze select * from tbl where c1=1 and c2 =1 order by id limit 10;
  46. QUERY PLAN
  47. -------------------------------------------------------------------------------------------------------------------------------
  48. Limit (cost=0.43..3703.11 rows=10 width=20) (actual time=7.199..23.926 rows=10 loops=1)
  49. -> Index Scan using idx_tbl_1 on tbl (cost=0.43..323243.84 rows=873 width=20) (actual time=7.198..23.921 rows=10 loops=1)
  50. Filter: ((c1 = 1) AND (c2 = 1))
  51. Rows Removed by Filter: 142814
  52. Planning time: 0.119 ms
  53. Execution time: 23.950 ms
  54. (6 rows)

小结

当SQL查询中包括排序,以及其他字段的过滤条件,并使用LIMIT快速返回少量数据时,如果满足条件的数据分布在排序键的末端,那么优化器给出的执行计划可能是不好的,导致通过排序索引扫描更多的数据后才能命中需要的记录。

然而,数据库目前使用的评估走排序键时,LIMIT需要扫描多少条记录,使用了数据均匀分布的假设,所以在数据(满足条件的数据与排序键本身的相关性不均匀)分布不均匀时,导致成本估算不准(oracle干脆走全表扫描)。

建议优化方法:

增加索引,创建等值查询条件列(s)加排序列(s)组成的复合索引,降低扫描量。

例子

  1. select * from tbl where c1=200 and c2 between 100 and 300 order by id limit 10;
  2. 增加索引
  3. (c1,id) -- 索引扫描, filter c2
  4. 已有
  5. (c1,c2) -- 索引扫描, sort id
  6. (id) -- 索引扫描, filter c1,c2
  1. select * from tbl where c1=200 and c2 =200 order by id limit 10;
  2. 增加索引
  3. (c1,c2,id) -- 索引扫描
  4. 已有
  5. (c1,c2) -- 索引扫描, sort id
  6. (id) -- 索引扫描, filter c1,c2

参考

《PostgreSQL 10 黑科技 - 自定义统计信息》

本文内容由网友自发贡献,转载请注明出处:https://www.wpsshop.cn/w/2023面试高手/article/detail/135185
推荐阅读
相关标签
  

闽ICP备14008679号