赞
踩
PostgreSQL in 的优化器处理以及如何优化
在使用数据库的过程中,经常会遇到需要匹配多个值的情况。
通常的写法包括:
- -- select * from table where id = any(array);
- -- select * from table where id in (values);
- -- select * from table where id=x or id=x or ....;
- -- select * from table where id in (query);
- -- select * from table where id in ( values query );
- -- select * from table t1 join (query or values query) t2 on t1.id=t2.id;
每种写法会产生多种执行计划的可能,如下:
- -- select * from table where id = any(array);
- 优化器可以使用index scan, bitmap scan, seq scan.
-
- -- select * from table where id in (values);
- 优化器可以使用index scan, bitmap scan, seq scan.
-
- -- select * from table where id=x or id=x or ....;
- 优化器可以使用bitmap scan + BitmapOr, seq scan.
-
- -- select * from table where id in (query);
- 优化器可以使用join (merge,hash,nest).
-
- -- select * from table where id in ( values query );
- 优化器可以使用join (merge,hash,nest).
-
- -- select * from table t1 join (query or values query) t2 on t1.id=t2.id;
- 优化器可以使用join (merge,hash,nest).
SQL优化策略是尽量减少CPU的运算以及page的扫描数量。
下面针对每种SQL,看看对应的可能的执行计划有什么差别,(使用开关来控制执行计划的选择, 如set enable_indexscan=off)
支持的开关如下:
- enable_bitmapscan enable_hashjoin enable_indexscan enable_mergejoin enable_seqscan enable_tidscan
- enable_hashagg enable_indexonlyscan enable_material enable_nestloop enable_sort
开始测试,使用auto_explain输出执行计划:
- load 'auto_explain';
- set auto_explain.log_analyze =true;
- set auto_explain.log_buffers =true;
- set auto_explain.log_nested_statements=true;
- set auto_explain.log_timing=true;
- set auto_explain.log_triggers=true;
- set auto_explain.log_verbose=true;
- set auto_explain.log_min_duration=0;
- set client_min_messages ='log';
- set work_mem='8GB';
测试SQL写法1:
- -- select * from table where id = any(array);
-
- do language plpgsql
- $$
-
- declare
- v_id int[];
- begin
- select array_agg(trunc(random()*100000)) into v_id from generate_series(1,200) t(id);
- perform * from t_in_test where id = any (v_id);
- end;
-
- $$
- ;
优化器选择1 (index scan):
离散扫描,适合小的扫描集。
- LOG: duration: 2.312 ms plan:
- Query Text: SELECT * from t_in_test where id = any (v_id)
- Index Scan using t_in_test_pkey on public.t_in_test (cost=0.43..895.50 rows=200 width=37) (actual time=0.039..2.266 rows=200 loops=1)
- Output: id, info
- Index Cond: (t_in_test.id = ANY ('{50836,73414,41071,45604,...省略部分...,76236}'::integer[]))
- Buffers: shared hit=776
- CONTEXT: SQL statement "SELECT * from t_in_test where id = any (v_id)"
- PL/pgSQL function inline_code_block line 6 at PERFORM
优化器选择2 (bitmap scan):
比index scan多了Recheck的开销,以及按照ctid排序的开销。
适合大的扫描集,排序的目的是减少离散扫描,还可以用到块设备的prefetch。
- LOG: duration: 1.602 ms plan:
- Query Text: SELECT * from t_in_test where id = any (v_id)
- Bitmap Heap Scan on public.t_in_test (cost=888.55..1711.16 rows=200 width=37) (actual time=0.880..1.563 rows=200 loops=1)
- Output: id, info
- Recheck Cond: (t_in_test.id = ANY ('{32635,31123,6282,59640,...省略部分...,87705}'::integer[]))
- Heap Blocks: exact=184
- Buffers: shared hit=784
- -> Bitmap Index Scan on t_in_test_pkey (cost=0.00..888.50 rows=200 width=0) (actual time=0.846..0.846 rows=200 loops=1)
- Index Cond: (t_in_test.id = ANY ('{32635,31123,6282,59640,...省略部分...,87705}'::integer[]))
- Buffers: shared hit=600
- CONTEXT: SQL statement "SELECT * from t_in_test where id = any (v_id)"
- PL/pgSQL function inline_code_block line 6 at PERFORM
优化器选择3 (seq scan):
适合非常庞大的扫描集。
- LOG: duration: 19940.394 ms plan:
- Query Text: SELECT * from t_in_test where id = any (v_id)
- Seq Scan on public.t_in_test (cost=0.00..2683354.80 rows=200 width=37) (actual time=4.237..19940.330 rows=199 loops=1)
- Output: id, info
- Filter: (t_in_test.id = ANY ('{45867,72450,95153,86233,63073,11016,56010,47158,...省略部分...,90444}'::integer[]))
- Rows Removed by Filter: 9999801
- Buffers: shared hit=83334
- CONTEXT: SQL statement "SELECT * from t_in_test where id = any (v_id)"
- PL/pgSQL function inline_code_block line 6 at PERFORM
测试SQL写法2:
- -- select * from table where id in (values);
-
- do language plpgsql
- $$
-
- declare
- v_where text;
- begin
- select string_agg(id::text,',') into v_where from (select trunc(random()*100000)::int as id from generate_series(1,200) t(id)) t;
- execute 'select * from t_in_test where id in ('||v_where||')';
- end;
-
- $$
- ;
优化器选择1 (index scan):
- LOG: duration: 0.919 ms plan:
- Query Text: select * from t_in_test where id in (8826,2038,72163,29843,76886,37893,5279,64308,...省略部分...,48126,44868)
- Index Scan using t_in_test_pkey on public.t_in_test (cost=0.43..895.50 rows=200 width=37) (actual time=0.017..0.894 rows=200 loops=1)
- Output: id, info
- Index Cond: (t_in_test.id = ANY ('{8826,2038,72163,29843,76886,37893,5279,64308,7370,80216,...省略部分...,48126,44868}'::integer[]))
- Buffers: shared hit=779
- CONTEXT: SQL statement "select * from t_in_test where id in (8826,2038,72163,29843,76886,37893,5279,64308,7370,80216,...省略部分...,73366,48126,44868)"
- PL/pgSQL function inline_code_block line 6 at EXECUTE
优化器选择2 (bitmap scan):
- LOG: duration: 1.012 ms plan:
- Query Text: select * from t_in_test where id in (17424,80517,35148,38245,93
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。