当前位置:   article > 正文

PostgreSQL in 语法 的优化器处理以及如何优化

postgresql 优化 in

PostgreSQL in 的优化器处理以及如何优化

在使用数据库的过程中,经常会遇到需要匹配多个值的情况。
通常的写法包括:

  1. -- select * from table where id = any(array);
  2. -- select * from table where id in (values);
  3. -- select * from table where id=x or id=x or ....;
  4. -- select * from table where id in (query);
  5. -- select * from table where id in ( values query );
  6. -- select * from table t1 join (query or values query) t2 on t1.id=t2.id;

每种写法会产生多种执行计划的可能,如下:

  1. -- select * from table where id = any(array);
  2. 优化器可以使用index scan, bitmap scan, seq scan.
  3. -- select * from table where id in (values);
  4. 优化器可以使用index scan, bitmap scan, seq scan.
  5. -- select * from table where id=x or id=x or ....;
  6. 优化器可以使用bitmap scan + BitmapOr, seq scan.
  7. -- select * from table where id in (query);
  8. 优化器可以使用join (merge,hash,nest).
  9. -- select * from table where id in ( values query );
  10. 优化器可以使用join (merge,hash,nest).
  11. -- select * from table t1 join (query or values query) t2 on t1.id=t2.id;
  12. 优化器可以使用join (merge,hash,nest).

SQL优化策略是尽量减少CPU的运算以及page的扫描数量。

下面针对每种SQL,看看对应的可能的执行计划有什么差别,(使用开关来控制执行计划的选择, 如set enable_indexscan=off)
支持的开关如下:

  1. enable_bitmapscan enable_hashjoin enable_indexscan enable_mergejoin enable_seqscan enable_tidscan
  2. enable_hashagg enable_indexonlyscan enable_material enable_nestloop enable_sort

开始测试,使用auto_explain输出执行计划:

  1. load 'auto_explain';
  2. set auto_explain.log_analyze =true;
  3. set auto_explain.log_buffers =true;
  4. set auto_explain.log_nested_statements=true;
  5. set auto_explain.log_timing=true;
  6. set auto_explain.log_triggers=true;
  7. set auto_explain.log_verbose=true;
  8. set auto_explain.log_min_duration=0;
  9. set client_min_messages ='log';
  10. set work_mem='8GB';

测试SQL写法1:

  1. -- select * from table where id = any(array);
  2. do language plpgsql
  3. $$
  4. declare
  5. v_id int[];
  6. begin
  7. select array_agg(trunc(random()*100000)) into v_id from generate_series(1,200) t(id);
  8. perform * from t_in_test where id = any (v_id);
  9. end;
  10. $$
  11. ;

优化器选择1 (index scan):
离散扫描,适合小的扫描集。

  1. LOG: duration: 2.312 ms plan:
  2. Query Text: SELECT * from t_in_test where id = any (v_id)
  3. Index Scan using t_in_test_pkey on public.t_in_test (cost=0.43..895.50 rows=200 width=37) (actual time=0.039..2.266 rows=200 loops=1)
  4. Output: id, info
  5. Index Cond: (t_in_test.id = ANY ('{50836,73414,41071,45604,...省略部分...,76236}'::integer[]))
  6. Buffers: shared hit=776
  7. CONTEXT: SQL statement "SELECT * from t_in_test where id = any (v_id)"
  8. PL/pgSQL function inline_code_block line 6 at PERFORM

优化器选择2 (bitmap scan):
比index scan多了Recheck的开销,以及按照ctid排序的开销。
适合大的扫描集,排序的目的是减少离散扫描,还可以用到块设备的prefetch。

  1. LOG: duration: 1.602 ms plan:
  2. Query Text: SELECT * from t_in_test where id = any (v_id)
  3. Bitmap Heap Scan on public.t_in_test (cost=888.55..1711.16 rows=200 width=37) (actual time=0.880..1.563 rows=200 loops=1)
  4. Output: id, info
  5. Recheck Cond: (t_in_test.id = ANY ('{32635,31123,6282,59640,...省略部分...,87705}'::integer[]))
  6. Heap Blocks: exact=184
  7. Buffers: shared hit=784
  8. -> Bitmap Index Scan on t_in_test_pkey (cost=0.00..888.50 rows=200 width=0) (actual time=0.846..0.846 rows=200 loops=1)
  9. Index Cond: (t_in_test.id = ANY ('{32635,31123,6282,59640,...省略部分...,87705}'::integer[]))
  10. Buffers: shared hit=600
  11. CONTEXT: SQL statement "SELECT * from t_in_test where id = any (v_id)"
  12. PL/pgSQL function inline_code_block line 6 at PERFORM

优化器选择3 (seq scan):
适合非常庞大的扫描集。

  1. LOG: duration: 19940.394 ms plan:
  2. Query Text: SELECT * from t_in_test where id = any (v_id)
  3. Seq Scan on public.t_in_test (cost=0.00..2683354.80 rows=200 width=37) (actual time=4.237..19940.330 rows=199 loops=1)
  4. Output: id, info
  5. Filter: (t_in_test.id = ANY ('{45867,72450,95153,86233,63073,11016,56010,47158,...省略部分...,90444}'::integer[]))
  6. Rows Removed by Filter: 9999801
  7. Buffers: shared hit=83334
  8. CONTEXT: SQL statement "SELECT * from t_in_test where id = any (v_id)"
  9. PL/pgSQL function inline_code_block line 6 at PERFORM

测试SQL写法2:

  1. -- select * from table where id in (values);
  2. do language plpgsql
  3. $$
  4. declare
  5. v_where text;
  6. begin
  7. select string_agg(id::text,',') into v_where from (select trunc(random()*100000)::int as id from generate_series(1,200) t(id)) t;
  8. execute 'select * from t_in_test where id in ('||v_where||')';
  9. end;
  10. $$
  11. ;

优化器选择1 (index scan):

  1. LOG: duration: 0.919 ms plan:
  2. Query Text: select * from t_in_test where id in (8826,2038,72163,29843,76886,37893,5279,64308,...省略部分...,48126,44868)
  3. Index Scan using t_in_test_pkey on public.t_in_test (cost=0.43..895.50 rows=200 width=37) (actual time=0.017..0.894 rows=200 loops=1)
  4. Output: id, info
  5. Index Cond: (t_in_test.id = ANY ('{8826,2038,72163,29843,76886,37893,5279,64308,7370,80216,...省略部分...,48126,44868}'::integer[]))
  6. Buffers: shared hit=779
  7. CONTEXT: SQL statement "select * from t_in_test where id in (8826,2038,72163,29843,76886,37893,5279,64308,7370,80216,...省略部分...,73366,48126,44868)"
  8. PL/pgSQL function inline_code_block line 6 at EXECUTE

优化器选择2 (bitmap scan):

  1. LOG: duration: 1.012 ms plan:
  2. Query Text: select * from t_in_test where id in (17424,80517,35148,38245,93
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/Gausst松鼠会/article/detail/597542
推荐阅读
相关标签
  

闽ICP备14008679号