赞
踩
PostgreSQL , 操作符 , 计算 , deform , LLVM , JIT
PostgreSQL 11 JIT,目前支持tuple deform(将磁盘上的tuple转换为内存中TUPLE格式),以及表达式(select, where, 等语义中的表达式,操作符运算,UDF等)的动态编译。
对海量数据的计算(并且表达式或TUPLE DEFORM已成为瓶颈时)有加速效果。
特别适合OLAP场景的大量数据的复杂计算。
《分析加速引擎黑科技 - LLVM、列存、多核并行、算子复用 大联姻 - 一起来开启PostgreSQL的百宝箱》
参考
《PostgreSQL 10 + PostGIS + Sharding(pg_pathman) + MySQL(fdw外部表) on ECS 部署指南(适合新用户)》
以CentOS 7.x x64为例
1、install epel
- wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
-
- rpm -ivh epel-release-latest-7.noarch.rpm
2、
- yum install -y cmake3
-
- yum install -y python
3、LLVM
- wget http://releases.llvm.org/6.0.0/llvm-6.0.0.src.tar.xz
-
- tar -xvf llvm-6.0.0.src.tar.xz
4、CLANG
- wget http://releases.llvm.org/6.0.0/cfe-6.0.0.src.tar.xz
-
- tar -xvf cfe-6.0.0.src.tar.xz
- mv cfe-6.0.0.src clang
- mv clang llvm-6.0.0.src/tools
5、安装支持JIT的编译器llvm和clang
- cd llvm-6.0.0.src
- mkdir mybuilddir
- cd mybuilddir
- alias cmake=cmake3
- export CXXFLAGS=-O2
- cmake ..
- cmake --build .
- cmake -DCMAKE_INSTALL_PREFIX=/usr/local/llvm6.0.0 -P cmake_install.cmake
6、编译PG with llvm
- wget https://ftp.postgresql.org/pub/snapshot/dev/postgresql-snapshot.tar.bz2
-
- tar -jxvf postgresql-snapshot.tar.bz2
-
- cd postgresql-11devel
- CLANG=/usr/local/llvm6.0.0/bin/clang LLVM_CONFIG=/usr/local/llvm6.0.0/bin/llvm-config ./configure --prefix=/home/digoal/pgsql11_llvm --with-llvm --with-blocksize=32 --with-segsize=2
- make world -j 128
- make install-world
7、pg with llvm的目录结构
bit code (二进制码)
- digoal@iZbp13nu0s9j3x3op4zpd4Z-> pwd
- /home/digoal/pgsql11_llvm/lib
- digoal@iZbp13nu0s9j3x3op4zpd4Z-> ll llvmjit*
- -rwxr-xr-x 1 root root 709M Apr 4 06:58 llvmjit.so
- -rw-r--r-- 1 root root 7.8K Apr 4 06:58 llvmjit_types.bc
-
- digoal@iZbp13nu0s9j3x3op4zpd4Z-> pwd
- /home/digoal/pgsql11_llvm/lib/bitcode
- digoal@iZbp13nu0s9j3x3op4zpd4Z-> ll
- total 1.8M
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 adminpack
- -rw-r--r-- 1 root root 1.4K Apr 4 06:58 adminpack.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 amcheck
- -rw-r--r-- 1 root root 2.3K Apr 4 06:58 amcheck.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 auth_delay
- -rw-r--r-- 1 root root 376 Apr 4 06:58 auth_delay.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 auto_explain
- -rw-r--r-- 1 root root 1.7K Apr 4 06:58 auto_explain.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 autoinc
- -rw-r--r-- 1 root root 528 Apr 4 06:58 autoinc.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 bloom
- -rw-r--r-- 1 root root 1.8K Apr 4 06:58 bloom.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 btree_gin
- -rw-r--r-- 1 root root 5.8K Apr 4 06:58 btree_gin.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 btree_gist
- -rw-r--r-- 1 root root 20K Apr 4 06:58 btree_gist.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 citext
- -rw-r--r-- 1 root root 1.3K Apr 4 06:58 citext.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 cube
- -rw-r--r-- 1 root root 6.4K Apr 4 06:58 cube.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 dblink
- -rw-r--r-- 1 root root 5.6K Apr 4 06:58 dblink.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 dict_int
- -rw-r--r-- 1 root root 480 Apr 4 06:58 dict_int.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 dict_xsyn
- -rw-r--r-- 1 root root 656 Apr 4 06:58 dict_xsyn.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 earthdistance
- -rw-r--r-- 1 root root 280 Apr 4 06:58 earthdistance.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 file_fdw
- -rw-r--r-- 1 root root 1.8K Apr 4 06:58 file_fdw.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 fuzzystrmatch
- -rw-r--r-- 1 root root 5.7K Apr 4 06:58 fuzzystrmatch.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 hstore
- -rw-r--r-- 1 root root 6.9K Apr 4 06:58 hstore.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 insert_username
- -rw-r--r-- 1 root root 544 Apr 4 06:58 insert_username.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 _int
- -rw-r--r-- 1 root root 6.9K Apr 4 06:58 _int.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 isn
- -rw-r--r-- 1 root root 64K Apr 4 06:58 isn.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 lo
- -rw-r--r-- 1 root root 432 Apr 4 06:58 lo.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 ltree
- -rw-r--r-- 1 root root 7.4K Apr 4 06:58 ltree.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 moddatetime
- -rw-r--r-- 1 root root 592 Apr 4 06:58 moddatetime.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 pageinspect
- -rw-r--r-- 1 root root 6.3K Apr 4 06:58 pageinspect.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 passwordcheck
- -rw-r--r-- 1 root root 436 Apr 4 06:58 passwordcheck.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 pg_buffercache
- -rw-r--r-- 1 root root 660 Apr 4 06:58 pg_buffercache.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 pgcrypto
- -rw-r--r-- 1 root root 23K Apr 4 06:58 pgcrypto.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 pg_freespacemap
- -rw-r--r-- 1 root root 372 Apr 4 06:58 pg_freespacemap.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 pg_prewarm
- -rw-r--r-- 1 root root 2.6K Apr 4 06:58 pg_prewarm.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 pgrowlocks
- -rw-r--r-- 1 root root 968 Apr 4 06:58 pgrowlocks.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 pg_stat_statements
- -rw-r--r-- 1 root root 3.1K Apr 4 06:58 pg_stat_statements.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 pgstattuple
- -rw-r--r-- 1 root root 3.3K Apr 4 06:58 pgstattuple.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 pg_trgm
- -rw-r--r-- 1 root root 4.1K Apr 4 06:58 pg_trgm.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 pg_visibility
- -rw-r--r-- 1 root root 1.1K Apr 4 06:58 pg_visibility.index.bc
- drwxr-xr-x 25 root root 4.0K Apr 4 06:58 postgres
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 postgres_fdw
- -rw-r--r-- 1 root root 12K Apr 4 06:58 postgres_fdw.index.bc
- -rw-r--r-- 1 root root 1.3M Apr 4 06:58 postgres.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 refint
- -rw-r--r-- 1 root root 1.9K Apr 4 06:58 refint.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 seg
- -rw-r--r-- 1 root root 5.2K Apr 4 06:58 seg.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 tablefunc
- -rw-r--r-- 1 root root 2.1K Apr 4 06:58 tablefunc.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 tcn
- -rw-r--r-- 1 root root 584 Apr 4 06:58 tcn.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 test_decoding
- -rw-r--r-- 1 root root 1.5K Apr 4 06:58 test_decoding.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 timetravel
- -rw-r--r-- 1 root root 1.2K Apr 4 06:58 timetravel.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 tsm_system_rows
- -rw-r--r-- 1 root root 524 Apr 4 06:58 tsm_system_rows.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 tsm_system_time
- -rw-r--r-- 1 root root 524 Apr 4 06:58 tsm_system_time.index.bc
- drwxr-xr-x 2 root root 4.0K Apr 4 06:58 unaccent
- -rw-r--r-- 1 root root 844 Apr 4 06:58 unaccent.index.bc
-
- digoal@iZbp13nu0s9j3x3op4zpd4Z-> cd unaccent
- digoal@iZbp13nu0s9j3x3op4zpd4Z-> ll
- total 16K
- -rw-r--r-- 1 root root 15K Apr 4 06:58 unaccent.bc
7.1、使用llvm-bcanalyzer观察bc的内容
- cd /home/digoal/pgsql11_llvm/lib/bitcode/postgres/access/heap
-
- /usr/local/llvm6.0.0/bin/llvm-bcanalyzer ./heapam.bc
-
- Summary of ./heapam.bc:
- Total size: 1459456b/182432.00B/45608W
- Stream type: LLVM IR
- # Toplevel Blocks: 4
-
- Per-block Summary:
- Block ID #0 (BLOCKINFO_BLOCK):
- Num Instances: 1
- Total Size: 672b/84.00B/21W
- Percent of file: 0.0460%
- Num SubBlocks: 0
- Num Abbrevs: 16
- Num Records: 3
- Percent Abbrevs: 0.0000%
-
- Record Histogram:
- Count # Bits b/Rec % Abv Record Kind
- 3 60 20.0 SETBID
-
- Block ID #8 (MODULE_BLOCK):
- Num Instances: 1
- Total Size: 44789b/5598.62B/1399W
- Percent of file: 3.0689%
- Num SubBlocks: 79
- Num Abbrevs: 3
- Num Records: 317
- Percent Abbrevs: 1.5773%
-
- Record Histogram:
- Count # Bits b/Rec % Abv Record Kind
- 222 31488 141.8 FUNCTION
- 89 11128 125.0 3.37 GLOBALVAR
- 1 207 HASH
- 1 57 100.00 SOURCE_FILENAME
- 1 35 100.00 VSTOFFSET
- 1 465 DATALAYOUT
- 1 303 TRIPLE
- 1 21 VERSION
- ......
8、设置环境变量
- su - digoal
- vi env_pg11_llvm.sh
-
- export PS1="$USER@`/bin/hostname -s`-> "
- export PGPORT=4000
- export PGDATA=/data01/pg/pg_root$PGPORT
- export LANG=en_US.utf8
- export PGHOME=/home/digoal/pgsql11_llvm
- export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH
- export DATE=`date +"%Y%m%d%H%M"`
- export PATH=$PGHOME/bin:$PATH:.
- export MANPATH=$PGHOME/share/man:$MANPATH
- export PGHOST=$PGDATA
- export PGUSER=postgres
- export PGDATABASE=postgres
- alias rm='rm -i'
- alias ll='ls -lh'
- unalias vi
-
- . ./env_pg11_llvm.sh
9、初始化数据库
initdb -D $PGDATA -U postgres -E UTF8 --locale=en_US.UTF8 -X /data02/pg/pg_wal_$PGPORT
10、配置postgresql.conf
- listen_addresses = '0.0.0.0'
- port = 4000
- max_connections = 3000
- unix_socket_directories = '/tmp,.'
- unix_socket_permissions = 0700
- tcp_keepalives_idle = 60
- tcp_keepalives_interval = 10
- tcp_keepalives_count = 10
- shared_buffers = 128GB
- huge_pages = on
- work_mem = 32MB
- maintenance_work_mem = 1GB
- dynamic_shared_memory_type = posix
- vacuum_cost_delay = 0
- bgwriter_delay = 10ms
- bgwriter_lru_maxpages = 500
- bgwriter_lru_multiplier = 5.0
- effective_io_concurrency = 0
- max_worker_processes = 128
- max_parallel_maintenance_workers = 2
- max_parallel_workers_per_gather = 32
- parallel_leader_participation = on
- max_parallel_workers = 128
- wal_level = minimal
- synchronous_commit = off
- wal_buffers = 128MB
- wal_writer_delay = 10ms
- wal_writer_flush_after = 8MB
- checkpoint_timeout = 35min
- max_wal_size = 256GB
- min_wal_size = 64GB
- max_wal_senders = 0
- enable_bitmapscan = on
- enable_hashagg = on
- enable_hashjoin = on
- enable_indexscan = on
- enable_indexonlyscan = on
- enable_material = on
- enable_mergejoin = on
- enable_nestloop = on
- enable_parallel_append = on
- enable_seqscan = on
- enable_sort = on
- enable_tidscan = on
- enable_partitionwise_join = on
- enable_partitionwise_aggregate = on
- enable_parallel_hash = on
- random_page_cost = 1.1
- effective_cache_size = 400GB
- log_destination = 'csvlog'
- logging_collector = on
- log_truncate_on_rotation = on
- log_checkpoints = on
- log_connections = on
- log_disconnections = on
- log_error_verbosity = verbose
- log_timezone = 'PRC'
- log_autovacuum_min_duration = 0
- autovacuum_vacuum_cost_delay = 0
- datestyle = 'iso, mdy'
- timezone = 'PRC'
- lc_messages = 'en_US.UTF8'
- lc_monetary = 'en_US.UTF8'
- lc_numeric = 'en_US.UTF8'
- lc_time = 'en_US.UTF8'
- default_text_search_config = 'pg_catalog.english'
-
- # JIT相关参数
- jit_above_cost = 100000
- jit_optimize_above_cost = 500000
- jit_inline_above_cost = 500000
- dynamic_library_path = '$libdir'
- jit = on
- jit_provider = 'llvmjit'
目前已支持表达式、TUPLE DEFORM的JIT。
Currently PostgreSQL's JIT implementation has support for accelerating expression evaluation and tuple deforming. Several other operations could be accelerated in the future.
Expression evaluation is used to evaluate WHERE clauses, target lists, aggregates and projections. It can be accelerated by generating code specific to each case.
Tuple deforming is the process of transforming an on-disk tuple (see Section 68.6) into its in-memory representation. It can be accelerated by creating a function specific to the table layout and the number of columns to be extracted.
1、需要处理的数据量庞大
2、每条记录需要大量的操作符参与计算
3、需要查询的字段靠后,需要先deform tuple中前面的字段
- 249 Currently expression evaluation and tuple deforming are JITed. Those
- 250 were chosen because they commonly are major CPU bottlenecks in
- 251 analytics queries, but are by no means the only potentially beneficial cases.
- 252
- 253 For JITing to be beneficial a piece of code first and foremost has to
- 254 be a CPU bottleneck. But also importantly, JITing can only be
- 255 beneficial if overhead can be removed by doing so. E.g. in the tuple
- 256 deforming case the knowledge about the number of columns and their
- 257 types can remove a significant number of branches, and in the
- 258 expression evaluation case a lot of indirect jumps/calls can be
- 259 removed. If neither of these is the case, JITing is a waste of
- 260 resources.
- 261
- 262 Future avenues for JITing are tuple sorting, COPY parsing/output
- 263 generation, and later compiling larger parts of queries.
1、jit (boolean)
Determines whether JIT may be used by PostgreSQL, if available (see Chapter 32). The default is on.
默认ON,表示开启JIT。
2、jit_above_cost (floating point)
Sets the planner's cutoff above which JIT compilation is used as part of query execution (see Chapter 32). Performing JIT costs time but can accelerate query execution. The default is 100000.
默认100000,当planner发现COST大于这个值时,优化器会启用JIT动态编译。
3、jit_optimize_above_cost (floating point)
Sets the planner's cutoff above which JIT compiled programs (see jit_above_cost) are optimized. Optimization initially takes time, but can improve execution speed. It is not meaningful to set this to a lower value than jit_above_cost. The default is 500000.
默认500000,当planner发现COST大于这个值时,优化器会启用JIT动态编译优化。
4、jit_inline_above_cost (floating point)
Sets the planner's cutoff above which JIT compiled programs (see jit_above_cost) attempt to inline functions and operators. Inlining initially takes time, but can improve execution speed. It is unlikely to be beneficial to set jit_inline_above_cost below jit_optimize_above_cost. The default is 500000.
默认500000,当planner发现COST大于这个值时,优化器会对用户自定义函数、操作符(目前仅支持C, internal类型的函数)启用JIT优化。
https://www.postgresql.org/docs/devel/static/jit-extensibility.html#JIT-PLUGGABLE
5、jit_provider (string)
Determines which JIT provider (see JIT Extensibility) is used. The built-in default is llvmjit.
If set to a non-existent library JIT will not be available, but no error will be raised. This allows JIT support to be installed separately from the main PostgreSQL package. This parameter can only be set at server start.
为了让JIT支持更多的编译器,PG设计时对编译器的支持也是模块化的,通过jit_provider可以指定使用哪个编译器,当然这个需要实现对应的provider接口才行。
https://www.postgresql.org/docs/devel/static/jit-extensibility.html#JIT-PLUGGABLE
目前PG默认选择的是LLVM编译器,原因是LLVM友好的许可协议与PG的开源许可协议无冲突。第二方面是LLVM后面有很强大的公司在支撑,比如苹果。
- 52 PostgreSQL, by default, uses LLVM to perform JIT. LLVM was chosen
- 53 because it is developed by several large corporations and therefore
- 54 unlikely to be discontinued, because it has a license compatible with
- 55 PostgreSQL, and because its IR can be generated from C using the Clang
- 56 compiler.
src/backend/jit/jit.c
- /* GUCs */
- bool jit_enabled = true;
- char *jit_provider = "llvmjit";
- bool jit_debugging_support = false;
- bool jit_dump_bitcode = false;
- bool jit_expressions = true;
- bool jit_profiling_support = false;
- bool jit_tuple_deforming = true;
- double jit_above_cost = 100000;
- double jit_inline_above_cost = 500000;
- double jit_optimize_above_cost = 500000;
src/include/jit/jit.h
- /* Flags determining what kind of JIT operations to perform */
- #define PGJIT_NONE 0
- #define PGJIT_PERFORM 1 << 0
- #define PGJIT_OPT3 1 << 1
- #define PGJIT_INLINE 1 << 2
- #define PGJIT_EXPR 1 << 3
- #define PGJIT_DEFORM 1 << 4
src/backend/optimizer/plan/planner.c
- result->jitFlags = PGJIT_NONE;
- if (jit_enabled && jit_above_cost >= 0 &&
- top_plan->total_cost > jit_above_cost)
- {
- result->jitFlags |= PGJIT_PERFORM;
-
- /*
- * Decide how much effort should be put into generating better code.
- */
- if (jit_optimize_above_cost >= 0 &&
- top_plan->total_cost > jit_optimize_above_cost)
- result->jitFlags |= PGJIT_OPT3;
- if (jit_inline_above_cost >= 0 &&
- top_plan->total_cost > jit_inline_above_cost)
- result->jitFlags |= PGJIT_INLINE;
-
- /*
- * Decide which operations should be JITed.
- */
- if (jit_expressions)
- result->jitFlags |= PGJIT_EXPR;
- if (jit_tuple_deforming)
- result->jitFlags |= PGJIT_DEFORM;
- }
1、jit_debugging_support (boolean)
If LLVM has the required functionality, register generated functions with GDB. This makes debugging easier. The default setting is off, and can only be set at server start.
如果要使用GDB调试JIT动态产生的code,那么需要打开jit_debugging_support
2、jit_dump_bitcode (boolean)
Writes the generated LLVM IR out to the filesystem, inside data_directory. This is only useful for working on the internals of the JIT implementation. The default setting is off, and it can only be changed by a superuser.
是否导出JIT产生的中间代码。会写入$PGDATA目录中。
3、jit_expressions (boolean)
Determines whether expressions are JIT compiled, subject to costing decisions (see Section 32.2). The default is on.
是否对表达式实施jit。默认打开,关闭的话不会对表达式实施JIT.
4、jit_profiling_support (boolean)
If LLVM has the required functionality, emit required data to allow perf to profile functions generated by JIT. This writes out files to $HOME/.debug/jit/; the user is responsible for performing cleanup when desired. The default setting is off, and can only be set at server start.
如果要使用perf来分析JIT代码的profiling,那么需要打开jit_profiling_support这个参数。结果保留在$HOME/.debug/jit/
,注意自己清理这个目录。
5、jit_tuple_deforming (boolean)
Determines whether tuple deforming is JIT compiled, subject to costing decisions (see Section 32.2). The default is on.
是否对tuple deforming式实施jit。默认打开,关闭的话不会对tuple deforming(即将磁盘中的TUPLE格式转换为内存中的TUPLE格式)实施JIT.
暂时没有
- 225 Currently it is not yet possible to cache generated functions, even
- 226 though that'd be desirable from a performance point of view. The
- 227 problem is that the generated functions commonly contain pointers into
- 228 per-execution memory. The expression evaluation machinery needs to
- 229 be redesigned a bit to avoid that. Basically all per-execution memory
- 230 needs to be referenced as an offset to one block of memory stored in
- 231 an ExprState, rather than absolute pointers into memory.
- 232
- 233 Once that is addressed, adding an LRU cache that's keyed by the
- 234 generated LLVM IR will allow to use optimized functions even for
- 235 faster queries.
- 236
- 237 A longer term project is to move expression compilation to the planner
- 238 stage, allowing e.g. to tie compiled expressions to prepared
- 239 statements.
- 240
- 241 An even more advanced approach would be to use JIT with few
- 242 optimizations initially, and build an optimized version in the
- 243 background. But that's even further off.
- 244
目前支持LLVM
- 52 PostgreSQL, by default, uses LLVM to perform JIT. LLVM was chosen
- 53 because it is developed by several large corporations and therefore
- 54 unlikely to be discontinued, because it has a license compatible with
- 55 PostgreSQL, and because its IR can be generated from C using the Clang
- 56 compiler.
-
- 199 It obviously is undesirable to maintain a second implementation of
- 200 commonly used functions, just for inlining purposes. Instead we take
- 201 advantage of the fact that the Clang compiler can emit LLVM IR.
通过实现jit provider可以扩展更多的编译器支持。
- 59 Shared Library Separation
- 60 -------------------------
- 61
- 62 To avoid the main PostgreSQL binary directly depending on LLVM, which
- 63 would prevent LLVM support being independently installed by OS package
- 64 managers, the LLVM dependent code is located in a shared library that
- 65 is loaded on-demand.
- 66
- 67 An additional benefit of doing so is that it is relatively easy to
- 68 evaluate JIT compilation that does not use LLVM, by changing out the
- 69 shared library used to provide JIT compilation.
- 70
- 71 To achieve this, code intending to perform JIT (e.g. expression evaluation)
- 72 calls an LLVM independent wrapper located in jit.c to do so. If the
- 73 shared library providing JIT support can be loaded (i.e. PostgreSQL was
- 74 compiled with LLVM support and the shared library is installed), the task
- 75 of JIT compiling an expression gets handed off to the shared library. This
- 76 obviously requires that the function in jit.c is allowed to fail in case
- 77 no JIT provider can be loaded.
- 78
- 79 Which shared library is loaded is determined by the jit_provider GUC,
- 80 defaulting to "llvmjit".
- 81
- 82 Cloistering code performing JIT into a shared library unfortunately
- 83 also means that code doing JIT compilation for various parts of code
- 84 has to be located separately from the code doing so without
- 85 JIT. E.g. the JIT version of execExprInterp.c is located in jit/llvm/
- 86 rather than executor/.
- 177 Instead there is one small file (llvmjit_types.c) which references each of
- 178 the types required for JITing. That file is translated to bitcode at
- 179 compile time, and loaded when LLVM is initialized in a backend.
- $pkglibdir/bitcode/postgres/
-
-
- 211 Similarly extensions can install code into
- 212 $pkglibdir/bitcode/[extension]/
- 213 accompanied by
- 214 $pkglibdir/bitcode/[extension].index.bc
1、建表(65个字段,33个INT8,32个TEXT)
- do language plpgsql $$
- declare
- sql text;
- begin
- sql := 'create table test(id int8,';
- -- 32 个text字段
- for i in 1..32 loop
- sql := sql||'c'||i||' text default md5(random()::text),';
- end loop;
- -- 32 个int8字段
- for i in 33..64 loop
- sql := sql||'c'||i||' int8 default random()*10,';
- end loop;
- sql := rtrim(sql, ',');
- sql := sql||')';
- execute sql;
- end;
- $$;
2、写入1亿测试数据
for((i=1;i<=100;i++)) do nohup psql -c "insert into test select generate_series(1,1000000);" >/dev/null 2>&1 & done
空间占用127 GB
- postgres=# \dt+
- List of relations
- Schema | Name | Type | Owner | Size | Description
- --------+------+-------+----------+--------+-------------
- public | test | table | postgres | 127 GB |
- (1 row)
并行度24强制开关
- set max_parallel_workers_per_gather =24;
- set parallel_setup_cost =0;
- set parallel_tuple_cost =0;
- set min_parallel_table_scan_size =0;
- set min_parallel_index_scan_size =0;
- alter table test set (parallel_workers =24);
3、多表达式计算测试
- explain select c33,c64,count(*),avg(c33),min(c34),max(c35),stddev(c36),sum(c37),stddev(greatest(c37,c38)),
- sum(
- case c38
- when 0 then c39
- when 1 then c40
- when 2 then c41
- when 3 then c42
- when 4 then c43
- when 5 then c44
- when 6 then c45
- when 7 then c46
- when 8 then c47
- when 9 then c48
- when 10 then c49
- when 11 then c50
- when 12 then c51
- when 13 then c52
- when 14 then c53
- when 15 then c54
- when 16 then c55
- else c56
- end
- ) from test
- where
- c64^2>0
- and sqrt(c64)>0
- and hashtext(c1)>hashtext(c2)
- and
- pow(sqrt(abs(hashint8(case c38
- when 0 then c39
- when 1 then c40
- when 2 then c41
- when 3 then c42
- when 4 then c43
- when 5 then c44
- when 6 then c45
- when 7 then c46
- when 8 then c47
- when 9 then c48
- when 10 then c49
- when 11 then c50
- when 12 then c51
- when 13 then c52
- when 14 then c53
- when 15 then c54
- when 16 then c55
- else c56
- end))),2) >0
- group by c33,c64;
-
- QUERY PLAN
- -------------------------------------------------------------------------------------------
- Finalize GroupAggregate (cost=4545694.04..4549612.47 rows=121 width=200)
- Group Key: c33, c64
- -> Gather Merge (cost=4545694.04..4549500.85 rows=2904 width=200)
- Workers Planned: 24
- -> Partial GroupAggregate (cost=4545693.46..4549425.58 rows=121 width=200)
- Group Key: c33, c64
- -> Sort (cost=4545693.46..4545822.06 rows=51440 width=200)
- Sort Key: c33, c64
- -> Parallel Seq Scan on test (cost=0.00..4541668.12 rows=51440 width=200)
- Filter: ((((c64)::double precision ^ '2'::double precision) > '0'::double precision) AND (sqrt((c64)::double precision) > '0'::double precision) AND (hashtext(c1) > hashtext(c2)) AND (pow(sqrt((abs(hashint8(CASE c38 WHEN 0 THEN c39 WHEN 1 THEN c40 WHEN 2 THEN c41 WHEN 3 THEN c42 WHEN 4 THEN c43 WHEN 5 THEN c44 WHEN 6 THEN c45 WHEN 7 THEN c46 WHEN 8 THEN c47 WHEN 9 THEN c48 WHEN 10 THEN c49 WHEN 11 THEN c50 WHEN 12 THEN c51 WHEN 13 THEN c52 WHEN 14 THEN c53 WHEN 15 THEN c54 WHEN 16 THEN c55 ELSE c56 END)))::double precision), '2'::double precision) > '0'::double precision))
- JIT:
- Functions: 18
- Inlining: true
- Optimization: true
- (14 rows)
3.1、开启JIT
set jit=on;
并行
- set max_parallel_workers_per_gather =24;
-
- Finalize GroupAggregate (cost=4545694.04..4549612.47 rows=121 width=200) (actual time=16228.136..18083.962 rows=110 loops=1)
- Output: c33, c64, count(*), avg(c33), min(c34), max(c35), stddev(c36), sum(c37), stddev(GREATEST(c37, c38)), sum(CASE c38 WHEN 0 THEN c39 WHEN 1 THEN c40 WHEN 2 THEN c41 WHEN 3 THEN c42 WHEN 4 THEN c43 WHEN 5 THEN c44 WHEN 6 THEN c45 WHEN 7 THEN c46 WHEN 8 THEN c47 WHEN 9 THEN c48 WHEN 10 THEN c49 WHEN 11 THEN c50 WHEN 12 THEN c51 WHEN 13 THEN c52 WHEN 14 THEN c53 WHEN 15 THEN c54 WHEN 16 THEN c55 ELSE c56 END)
- Group Key: test.c33, test.c64
- Buffers: shared hit=204165, temp read=14929 written=14950
- -> Gather Merge (cost=4545694.04..4549500.85 rows=2904 width=200) (actual time=16219.163..18075.339 rows=2750 loops=1)
- Output: c33, c64, (PARTIAL count(*)), (PARTIAL avg(c33)), (PARTIAL min(c34)), (PARTIAL max(c35)), (PARTIAL stddev(c36)), (PARTIAL sum(c37)), (PARTIAL stddev(GREATEST(c37, c38))), (PARTIAL sum(CASE c38 WHEN 0 THEN c39 WHEN 1 THEN c40 WHEN 2 THEN c41 WHEN 3 THEN c42 WHEN 4 THEN c43 WHEN 5 THEN c44 WHEN 6 THEN c45 WHEN 7 THEN c46 WHEN 8 THEN c47 WHEN 9 THEN c48 WHEN 10 THEN c49 WHEN 11 THEN c50 WHEN 12 THEN c51 WHEN 13 THEN c52 WHEN 14 THEN c53 WHEN 15 THEN c54 WHEN 16 THEN c55 ELSE c56 END))
- Workers Planned: 24
- Workers Launched: 24
- Buffers: shared hit=204165, temp read=14929 written=14950
- -> Partial GroupAggregate (cost=4545693.46..4549425.58 rows=121 width=200) (actual time=16055.979..17551.930 rows=110 loops=25)
- Output: c33, c64, PARTIAL count(*), PARTIAL avg(c33), PARTIAL min(c34), PARTIAL max(c35), PARTIAL stddev(c36), PARTIAL sum(c37), PARTIAL stddev(GREATEST(c37, c38)), PARTIAL sum(CASE c38 WHEN 0 THEN c39 WHEN 1 THEN c40 WHEN 2 THEN c41 WHEN 3 THEN c42 WHEN 4 THEN c43 WHEN 5 THEN c44 WHEN 6 THEN c45 WHEN 7 THEN c46 WHEN 8 THEN c47 WHEN 9 THEN c48 WHEN 10 THEN c49 WHEN 11 THEN c50 WHEN 12 THEN c51 WHEN 13 THEN c52 WHEN 14 THEN c53 WHEN 15 THEN c54 WHEN 16 THEN c55 ELSE c56 END)
- Group Key: test.c33, test.c64
- Buffers: shared hit=4167196, temp read=304952 written=305369
- Worker 0: actual time=16063.931..17380.570 rows=110 loops=1
- Buffers: shared hit=146725, temp read=10743 written=10758
- Worker 1: actual time=16157.211..17453.030 rows=110 loops=1
- Buffers: shared hit=144252, temp read=10570 written=10584
- Worker 2: actual time=15937.390..17766.085 rows=110 loops=1
- Buffers: shared hit=203275, temp read=14873 written=14894
- Worker 3: actual time=16087.670..17463.100 rows=110 loops=1
- Buffers: shared hit=153917, temp read=11273 written=11288
- Worker 4: actual time=16055.948..17369.731 rows=110 loops=1
- Buffers: shared hit=146569, temp read=10723 written=10738
- Worker 5: actual time=16142.715..17518.040 rows=110 loops=1
- Buffers: shared hit=148380, temp read=10854 written=10869
- Worker 6: actual time=16191.173..17582.148 rows=110 loops=1
- Buffers: shared hit=154115, temp read=11273 written=11288
- Worker 7: actual time=16125.974..17411.918 rows=110 loops=1
- Buffers: shared hit=144344, temp read=10570 written=10584
- Worker 8: actual time=16162.576..17452.647 rows=110 loops=1
- Buffers: shared hit=144356, temp read=10570 written=10584
- Worker 9: actual time=16028.487..17333.188 rows=110 loops=1
- Buffers: shared hit=146326, temp read=10703 written=10718
- Worker 10: actual time=15909.637..17710.282 rows=110 loops=1
- Buffers: shared hit=202502, temp read=14819 written=14840
- Worker 11: actual time=15991.109..17201.509 rows=110 loops=1
- Buffers: shared hit=136312, temp read=9973 written=9987
- Worker 12: actual time=15948.994..17758.986 rows=110 loops=1
- Buffers: shared hit=202082, temp read=14788 written=14808
- Worker 13: actual time=15923.985..17734.805 rows=110 loops=1
- Buffers: shared hit=202136, temp read=14788 written=14808
- Worker 14: actual time=16185.559..17477.490 rows=110 loops=1
- Buffers: shared hit=144407, temp read=10570 written=10584
- Worker 15: actual time=15957.127..17879.999 rows=110 loops=1
- Buffers: shared hit=214616, temp read=15702 written=15724
- Worker 16: actual time=16150.016..17429.737 rows=110 loops=1
- Buffers: shared hit=138941, temp read=10168 written=10182
- Worker 17: actual time=16036.894..17849.093 rows=110 loops=1
- Buffers: shared hit=202016, temp read=14788 written=14808
- Worker 18: actual time=16129.668..17369.734 rows=110 loops=1
- Buffers: shared hit=138300, temp read=10126 written=10140
- Worker 19: actual time=15948.382..17832.470 rows=110 loops=1
- Buffers: shared hit=211842, temp read=15491 written=15512
- Worker 20: actual time=16153.969..17352.508 rows=110 loops=1
- Buffers: shared hit=134834, temp read=9867 written=9880
- Worker 21: actual time=15916.013..17716.694 rows=110 loops=1
- Buffers: shared hit=202104, temp read=14788 written=14808
- Worker 22: actual time=16164.524..17462.568 rows=110 loops=1
- Buffers: shared hit=144423, temp read=10570 written=10584
- Worker 23: actual time=16046.586..17455.273 rows=110 loops=1
- Buffers: shared hit=156257, temp read=11433 written=11449
- -> Sort (cost=4545693.46..4545822.06 rows=51440 width=200) (actual time=16048.787..16592.827 rows=1900091 loops=25)
- Output: c33, c64, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56
- Sort Key: test.c33, test.c64
- Sort Method: external merge Disk: 477728kB
- Worker 0: Sort Method: external merge Disk: 343776kB
- Worker 1: Sort Method: external merge Disk: 338240kB
- Worker 2: Sort Method: external merge Disk: 475936kB
- Worker 3: Sort Method: external merge Disk: 360736kB
- Worker 4: Sort Method: external merge Disk: 343136kB
- Worker 5: Sort Method: external merge Disk: 347328kB
- Worker 6: Sort Method: external merge Disk: 360736kB
- Worker 7: Sort Method: external merge Disk: 338240kB
- Worker 8: Sort Method: external merge Disk: 338240kB
- Worker 9: Sort Method: external merge Disk: 342496kB
- Worker 10: Sort Method: external merge Disk: 474208kB
- Worker 11: Sort Method: external merge Disk: 319136kB
- Worker 12: Sort Method: external merge Disk: 473216kB
- Worker 13: Sort Method: external merge Disk: 473216kB
- Worker 14: Sort Method: external merge Disk: 338240kB
- Worker 15: Sort Method: external merge Disk: 502464kB
- Worker 16: Sort Method: external merge Disk: 325376kB
- Worker 17: Sort Method: external merge Disk: 473216kB
- Worker 18: Sort Method: external merge Disk: 324032kB
- Worker 19: Sort Method: external merge Disk: 495712kB
- Worker 20: Sort Method: external merge Disk: 315744kB
- Worker 21: Sort Method: external merge Disk: 473216kB
- Worker 22: Sort Method: external merge Disk: 338240kB
- Worker 23: Sort Method: external merge Disk: 365856kB
- Buffers: shared hit=4167196, temp read=304952 written=305369
- Worker 0: actual time=16057.644..16539.626 rows=1673456 loops=1
- Buffers: shared hit=146725, temp read=10743 written=10758
- Worker 1: actual time=16150.861..16625.428 rows=1646431 loops=1
- Buffers: shared hit=144252, temp read=10570 written=10584
- Worker 2: actual time=15928.651..16592.993 rows=2316826 loops=1
- Buffers: shared hit=203275, temp read=14873 written=14894
- Worker 3: actual time=16081.087..16581.918 rows=1755939 loops=1
- Buffers: shared hit=153917, temp read=11273 written=11288
- Worker 4: actual time=16049.644..16528.522 rows=1670397 loops=1
- Buffers: shared hit=146569, temp read=10723 written=10738
- Worker 5: actual time=16135.668..16623.247 rows=1690829 loops=1
- Buffers: shared hit=148380, temp read=10854 written=10869
- Worker 6: actual time=16184.539..16693.836 rows=1755942 loops=1
- Buffers: shared hit=154115, temp read=11273 written=11288
- Worker 7: actual time=16119.729..16583.823 rows=1646433 loops=1
- Buffers: shared hit=144344, temp read=10570 written=10584
- Worker 8: actual time=16156.343..16626.634 rows=1646432 loops=1
- Buffers: shared hit=144356, temp read=10570 written=10584
- Worker 9: actual time=16022.254..16496.534 rows=1667285 loops=1
- Buffers: shared hit=146326, temp read=10703 written=10718
- Worker 10: actual time=15901.059..16553.594 rows=2308407 loops=1
- Buffers: shared hit=202502, temp read=14819 written=14840
- Worker 11: actual time=15985.290..16423.172 rows=1553527 loops=1
- Buffers: shared hit=136312, temp read=9973 written=9987
- Worker 12: actual time=15940.471..16603.025 rows=2303469 loops=1
- Buffers: shared hit=202082, temp read=14788 written=14808
- Worker 13: actual time=15915.230..16578.124 rows=2303471 loops=1
- Buffers: shared hit=202136, temp read=14788 written=14808
- Worker 14: actual time=16179.341..16650.442 rows=1646430 loops=1
- Buffers: shared hit=144407, temp read=10570 written=10584
- Worker 15: actual time=15948.014..16649.356 rows=2445950 loops=1
- Buffers: shared hit=214616, temp read=15702 written=15724
- Worker 16: actual time=16144.032..16604.106 rows=1583954 loops=1
- Buffers: shared hit=138941, temp read=10168 written=10182
- Worker 17: actual time=16028.217..16692.633 rows=2303463 loops=1
- Buffers: shared hit=202016, temp read=14788 written=14808
- Worker 18: actual time=16123.611..16575.587 rows=1577397 loops=1
- Buffers: shared hit=138300, temp read=10126 written=10140
- Worker 19: actual time=15939.379..16625.915 rows=2412964 loops=1
- Buffers: shared hit=211842, temp read=15491 written=15512
- Worker 20: actual time=16148.053..16579.552 rows=1536937 loops=1
- Buffers: shared hit=134834, temp read=9867 written=9880
- Worker 21: actual time=15907.527..16565.235 rows=2303464 loops=1
- Buffers: shared hit=202104, temp read=14788 written=14808
- Worker 22: actual time=16158.176..16633.146 rows=1646438 loops=1
- Buffers: shared hit=144423, temp read=10570 written=10584
- Worker 23: actual time=16039.773..16559.599 rows=1780966 loops=1
- Buffers: shared hit=156257, temp read=11433 written=11449
- -> Parallel Seq Scan on public.test (cost=0.00..4541668.12 rows=51440 width=200) (actual time=1032.131..5845.529 rows=1900091 loops=25)
- Output: c33, c64, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56
- Filter: ((((test.c64)::double precision ^ '2'::double precision) > '0'::double precision) AND (sqrt((test.c64)::double precision) > '0'::double precision) AND (hashtext(test.c1) > hashtext(test.c2)) AND (pow(sqrt((abs(hashint8(CASE test.c38 WHEN 0 THEN test.c39 WHEN 1 THEN test.c40 WHEN 2 THEN test.c41 WHEN 3 THEN test.c42 WHEN 4 THEN test.c43 WHEN 5 THEN test.c44 WHEN 6 THEN test.c45 WHEN 7 THEN test.c46 WHEN 8 THEN test.c47 WHEN 9 THEN test.c48 WHEN 10 THEN test.c49 WHEN 11 THEN test.c50 WHEN 12 THEN test.c51 WHEN 13 THEN test.c52 WHEN 14 THEN test.c53 WHEN 15 THEN test.c54 WHEN 16 THEN test.c55 ELSE test.c56 END)))::double precision), '2'::double precision) > '0'::double precision))
- Rows Removed by Filter: 2099909
- Buffers: shared hit=4166668
- Worker 0: actual time=1020.341..5255.168 rows=1673456 loops=1
- Buffers: shared hit=146703
- Worker 1: actual time=1027.407..5221.878 rows=1646431 loops=1
- Buffers: shared hit=144230
- Worker 2: actual time=1153.318..6915.975 rows=2316826 loops=1
- Buffers: shared hit=203253
- Worker 3: actual time=1020.220..5472.580 rows=1755939 loops=1
- Buffers: shared hit=153895
- Worker 4: actual time=1032.901..5292.200 rows=1670397 loops=1
- Buffers: shared hit=146547
- Worker 5: actual time=1017.394..5301.776 rows=1690829 loops=1
- Buffers: shared hit=148358
- Worker 6: actual time=1030.128..5511.666 rows=1755942 loops=1
- Buffers: shared hit=154093
- Worker 7: actual time=1030.999..5254.166 rows=1646433 loops=1
- Buffers: shared hit=144322
- Worker 8: actual time=1029.738..5242.343 rows=1646432 loops=1
- Buffers: shared hit=144334
- Worker 9: actual time=1043.564..5337.423 rows=1667285 loops=1
- Buffers: shared hit=146304
- Worker 10: actual time=1003.084..6826.056 rows=2308407 loops=1
- Buffers: shared hit=202480
- Worker 11: actual time=1041.330..5138.905 rows=1553527 loops=1
- Buffers: shared hit=136290
- Worker 12: actual time=1006.781..6753.862 rows=2303469 loops=1
- Buffers: shared hit=202060
- Worker 13: actual time=1004.776..6749.535 rows=2303471 loops=1
- Buffers: shared hit=202114
- Worker 14: actual time=1030.832..5281.034 rows=1646430 loops=1
- Buffers: shared hit=144385
- Worker 15: actual time=1003.335..7089.916 rows=2445950 loops=1
- Buffers: shared hit=214594
- Worker 16: actual time=1015.769..5129.373 rows=1583954 loops=1
- Buffers: shared hit=138919
- Worker 17: actual time=1003.077..6892.321 rows=2303463 loops=1
- Buffers: shared hit=201994
- Worker 18: actual time=1016.105..5052.138 rows=1577397 loops=1
- Buffers: shared hit=138278
- Worker 19: actual time=1006.413..6976.141 rows=2412964 loops=1
- Buffers: shared hit=211820
- Worker 20: actual time=1013.958..5004.057 rows=1536937 loops=1
- Buffers: shared hit=134812
- Worker 21: actual time=1009.717..6748.666 rows=2303464 loops=1
- Buffers: shared hit=202082
- Worker 22: actual time=1013.944..5248.687 rows=1646438 loops=1
- Buffers: shared hit=144401
- Worker 23: actual time=1009.779..5471.091 rows=1780966 loops=1
- Buffers: shared hit=156235
- Planning Time: 0.233 ms
- JIT:
- Functions: 18
- Generation Time: 7.829 ms
- Inlining: true
- Inlining Time: 19.924 ms
- Optimization: true
- Optimization Time: 778.549 ms
- Emission Time: 404.543 ms
- Execution Time: 18161.029 ms
- (201 rows)
非并行
- set max_parallel_workers_per_gather =0;
-
- HashAggregate (cost=13253090.64..13253093.36 rows=121 width=200) (actual time=142124.876..142125.769 rows=110 loops=1)
- Output: c33, c64, count(*), avg(c33), min(c34), max(c35), stddev(c36), sum(c37), stddev(GREATEST(c37, c38)), sum(CASE c38 WHEN 0 THEN c39 WHEN 1 THEN c40 WHEN 2 THEN c41 WHEN 3 THEN c42 WHEN 4 THEN c43 WHEN 5 THEN c44 WHEN 6 THEN c45 WHEN 7 THEN c46 WHEN 8 THEN c47 WHEN 9 THEN c48 WHEN 10 THEN c49 WHEN 11 THEN c50 WHEN 12 THEN c51 WHEN 13 THEN c52 WHEN 14 THEN c53 WHEN 15 THEN c54 WHEN 16 THEN c55 ELSE c56 END)
- Group Key: test.c33, test.c64
- Buffers: shared hit=4166668
- -> Seq Scan on public.test (cost=0.00..13166670.88 rows=1234568 width=200) (actual time=1127.028..114279.915 rows=47502283 loops=1)
- Output: id, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c21, c22, c23, c24, c25, c26, c27, c28, c29, c30, c31, c32, c33, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56, c57, c58, c59, c60, c61, c62, c63, c64
- Filter: ((((test.c64)::double precision ^ '2'::double precision) > '0'::double precision) AND (sqrt((test.c64)::double precision) > '0'::double precision) AND (hashtext(test.c1) > hashtext(test.c2)) AND (pow(sqrt((abs(hashint8(CASE test.c38 WHEN 0 THEN test.c39 WHEN 1 THEN test.c40 WHEN 2 THEN test.c41 WHEN 3 THEN test.c42 WHEN 4 THEN test.c43 WHEN 5 THEN test.c44 WHEN 6 THEN test.c45 WHEN 7 THEN test.c46 WHEN 8 THEN test.c47 WHEN 9 THEN test.c48 WHEN 10 THEN test.c49 WHEN 11 THEN test.c50 WHEN 12 THEN test.c51 WHEN 13 THEN test.c52 WHEN 14 THEN test.c53 WHEN 15 THEN test.c54 WHEN 16 THEN test.c55 ELSE test.c56 END)))::double precision), '2'::double precision) > '0'::double precision))
- Rows Removed by Filter: 52497717
- Buffers: shared hit=4166668
- Planning Time: 0.668 ms
- JIT:
- Functions: 9
- Generation Time: 6.997 ms
- Inlining: true
- Inlining Time: 16.558 ms
- Optimization: true
- Optimization Time: 739.139 ms
- Emission Time: 358.289 ms
- Execution Time: 142132.967 ms
- (19 rows)
3.2、关闭JIT
set jit=off;
并行
- set max_parallel_workers_per_gather =24;
-
- Finalize GroupAggregate (cost=4545694.04..4549612.47 rows=121 width=200) (actual time=12229.706..14319.958 rows=110 loops=1)
- Output: c33, c64, count(*), avg(c33), min(c34), max(c35), stddev(c36), sum(c37), stddev(GREATEST(c37, c38)), sum(CASE c38 WHEN 0 THEN c39 WHEN 1 THEN c40 WHEN 2 THEN c41 WHEN 3 THEN c42 WHEN 4 THEN c43 WHEN 5 THEN c44 WHEN 6 THEN c45 WHEN 7 THEN c46 WHEN 8 THEN c47 WHEN 9 THEN c48 WHEN 10 THEN c49 WHEN 11 THEN c50 WHEN 12 THEN c51 WHEN 13 THEN c52 WHEN 14 THEN c53 WHEN 15 THEN c54 WHEN 16 THEN c55 ELSE c56 END)
- Group Key: test.c33, test.c64
- Buffers: shared hit=193072, temp read=14121 written=14141
- -> Gather Merge (cost=4545694.04..4549500.85 rows=2904 width=200) (actual time=12218.740..14311.155 rows=2750 loops=1)
- Output: c33, c64, (PARTIAL count(*)), (PARTIAL avg(c33)), (PARTIAL min(c34)), (PARTIAL max(c35)), (PARTIAL stddev(c36)), (PARTIAL sum(c37)), (PARTIAL stddev(GREATEST(c37, c38))), (PARTIAL sum(CASE c38 WHEN 0 THEN c39 WHEN 1 THEN c40 WHEN 2 THEN c41 WHEN 3 THEN c42 WHEN 4 THEN c43 WHEN 5 THEN c44 WHEN 6 THEN c45 WHEN 7 THEN c46 WHEN 8 THEN c47 WHEN 9 THEN c48 WHEN 10 THEN c49 WHEN 11 THEN c50 WHEN 12 THEN c51 WHEN 13 THEN c52 WHEN 14 THEN c53 WHEN 15 THEN c54 WHEN 16 THEN c55 ELSE c56 END))
- Workers Planned: 24
- Workers Launched: 24
- Buffers: shared hit=193072, temp read=14121 written=14141
- -> Partial GroupAggregate (cost=4545693.46..4549425.58 rows=121 width=200) (actual time=12085.108..13888.218 rows=110 loops=25)
- Output: c33, c64, PARTIAL count(*), PARTIAL avg(c33), PARTIAL min(c34), PARTIAL max(c35), PARTIAL stddev(c36), PARTIAL sum(c37), PARTIAL stddev(GREATEST(c37, c38)), PARTIAL sum(CASE c38 WHEN 0 THEN c39 WHEN 1 THEN c40 WHEN 2 THEN c41 WHEN 3 THEN c42 WHEN 4 THEN c43 WHEN 5 THEN c44 WHEN 6 THEN c45 WHEN 7 THEN c46 WHEN 8 THEN c47 WHEN 9 THEN c48 WHEN 10 THEN c49 WHEN 11 THEN c50 WHEN 12 THEN c51 WHEN 13 THEN c52 WHEN 14 THEN c53 WHEN 15 THEN c54 WHEN 16 THEN c55 ELSE c56 END)
- Group Key: test.c33, test.c64
- Buffers: shared hit=4167196, temp read=304952 written=305367
- Worker 0: actual time=12000.792..13585.035 rows=110 loops=1
- Buffers: shared hit=147042, temp read=10761 written=10776
- Worker 1: actual time=12031.831..13687.539 rows=110 loops=1
- Buffers: shared hit=153988, temp read=11273 written=11288
- Worker 2: actual time=12146.820..13812.121 rows=110 loops=1
- Buffers: shared hit=154021, temp read=11273 written=11288
- Worker 3: actual time=12145.395..13804.585 rows=110 loops=1
- Buffers: shared hit=154017, temp read=11273 written=11288
- Worker 4: actual time=12091.839..14217.464 rows=110 loops=1
- Buffers: shared hit=197518, temp read=14457 written=14477
- Worker 5: actual time=12169.045..13787.539 rows=110 loops=1
- Buffers: shared hit=150711, temp read=11032 written=11047
- Worker 6: actual time=12141.908..13807.836 rows=110 loops=1
- Buffers: shared hit=154079, temp read=11273 written=11288
- Worker 7: actual time=12110.195..13777.285 rows=110 loops=1
- Buffers: shared hit=153838, temp read=11273 written=11288
- Worker 8: actual time=12137.576..13814.893 rows=110 loops=1
- Buffers: shared hit=154004, temp read=11273 written=11288
- Worker 9: actual time=12196.750..13862.298 rows=110 loops=1
- Buffers: shared hit=154129, temp read=11273 written=11288
- Worker 10: actual time=12092.448..14117.280 rows=110 loops=1
- Buffers: shared hit=188201, temp read=13770 written=13789
- Worker 11: actual time=12068.763..14188.286 rows=110 loops=1
- Buffers: shared hit=197254, temp read=14421 written=14441
- Worker 12: actual time=12190.847..13923.117 rows=110 loops=1
- Buffers: shared hit=153959, temp read=11273 written=11288
- Worker 13: actual time=12093.689..14117.677 rows=110 loops=1
- Buffers: shared hit=188177, temp read=13771 written=13790
- Worker 14: actual time=12141.815..13701.354 rows=110 loops=1
- Buffers: shared hit=144501, temp read=10570 written=10584
- Worker 15: actual time=12012.075..14027.824 rows=110 loops=1
- Buffers: shared hit=186377, temp read=13640 written=13659
- Worker 16: actual time=12092.582..14131.998 rows=110 loops=1
- Buffers: shared hit=188052, temp read=13759 written=13778
- Worker 17: actual time=11906.057..13579.607 rows=110 loops=1
- Buffers: shared hit=154068, temp read=11273 written=11288
- Worker 18: actual time=12179.217..13853.580 rows=110 loops=1
- Buffers: shared hit=154191, temp read=11273 written=11288
- Worker 19: actual time=11944.620..14048.474 rows=110 loops=1
- Buffers: shared hit=194987, temp read=14285 written=14305
- Worker 20: actual time=12152.533..13818.214 rows=110 loops=1
- Buffers: shared hit=153932, temp read=11273 written=11288
- Worker 21: actual time=12196.138..13864.195 rows=110 loops=1
- Buffers: shared hit=154158, temp read=11273 written=11288
- Worker 22: actual time=12091.831..14229.393 rows=110 loops=1
- Buffers: shared hit=197454, temp read=14452 written=14472
- Worker 23: actual time=11910.497..13476.474 rows=110 loops=1
- Buffers: shared hit=145466, temp read=10637 written=10652
- -> Sort (cost=4545693.46..4545822.06 rows=51440 width=200) (actual time=12076.303..12606.046 rows=1900091 loops=25)
- Output: c33, c64, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56
- Sort Key: test.c33, test.c64
- Sort Method: external merge Disk: 451872kB
- Worker 0: Sort Method: external merge Disk: 344352kB
- Worker 1: Sort Method: external merge Disk: 360736kB
- Worker 2: Sort Method: external merge Disk: 360736kB
- Worker 3: Sort Method: external merge Disk: 360736kB
- Worker 4: Sort Method: external merge Disk: 462624kB
- Worker 5: Sort Method: external merge Disk: 353024kB
- Worker 6: Sort Method: external merge Disk: 360736kB
- Worker 7: Sort Method: external merge Disk: 360736kB
- Worker 8: Sort Method: external merge Disk: 360736kB
- Worker 9: Sort Method: external merge Disk: 360736kB
- Worker 10: Sort Method: external merge Disk: 440640kB
- Worker 11: Sort Method: external merge Disk: 461472kB
- Worker 12: Sort Method: external merge Disk: 360736kB
- Worker 13: Sort Method: external merge Disk: 440672kB
- Worker 14: Sort Method: external merge Disk: 338240kB
- Worker 15: Sort Method: external merge Disk: 436480kB
- Worker 16: Sort Method: external merge Disk: 440288kB
- Worker 17: Sort Method: external merge Disk: 360736kB
- Worker 18: Sort Method: external merge Disk: 360736kB
- Worker 19: Sort Method: external merge Disk: 457120kB
- Worker 20: Sort Method: external merge Disk: 360736kB
- Worker 21: Sort Method: external merge Disk: 360736kB
- Worker 22: Sort Method: external merge Disk: 462464kB
- Worker 23: Sort Method: external merge Disk: 340384kB
- Buffers: shared hit=4167196, temp read=304952 written=305367
- Worker 0: actual time=11993.075..12463.819 rows=1676370 loops=1
- Buffers: shared hit=147042, temp read=10761 written=10776
- Worker 1: actual time=12023.856..12510.021 rows=1755946 loops=1
- Buffers: shared hit=153988, temp read=11273 written=11288
- Worker 2: actual time=12138.628..12631.082 rows=1755945 loops=1
- Buffers: shared hit=154021, temp read=11273 written=11288
- Worker 3: actual time=12137.214..12626.779 rows=1755935 loops=1
- Buffers: shared hit=154017, temp read=11273 written=11288
- Worker 4: actual time=12081.480..12703.075 rows=2251992 loops=1
- Buffers: shared hit=197518, temp read=14457 written=14477
- Worker 5: actual time=12161.027..12638.282 rows=1718516 loops=1
- Buffers: shared hit=150711, temp read=11032 written=11047
- Worker 6: actual time=12133.786..12626.911 rows=1755948 loops=1
- Buffers: shared hit=154079, temp read=11273 written=11288
- Worker 7: actual time=12102.069..12591.735 rows=1755947 loops=1
- Buffers: shared hit=153838, temp read=11273 written=11288
- Worker 8: actual time=12129.413..12622.455 rows=1755936 loops=1
- Buffers: shared hit=154004, temp read=11273 written=11288
- Worker 9: actual time=12188.648..12679.617 rows=1755941 loops=1
- Buffers: shared hit=154129, temp read=11273 written=11288
- Worker 10: actual time=12082.492..12676.479 rows=2145118 loops=1
- Buffers: shared hit=188201, temp read=13770 written=13789
- Worker 11: actual time=12058.276..12683.336 rows=2246436 loops=1
- Buffers: shared hit=197254, temp read=14421 written=14441
- Worker 12: actual time=12182.379..12683.611 rows=1755944 loops=1
- Buffers: shared hit=153959, temp read=11273 written=11288
- Worker 13: actual time=12083.860..12673.566 rows=2145124 loops=1
- Buffers: shared hit=188177, temp read=13771 written=13790
- Worker 14: actual time=12134.030..12593.859 rows=1646436 loops=1
- Buffers: shared hit=144501, temp read=10570 written=10584
- Worker 15: actual time=12002.363..12594.854 rows=2124740 loops=1
- Buffers: shared hit=186377, temp read=13640 written=13659
- Worker 16: actual time=12082.700..12682.126 rows=2143293 loops=1
- Buffers: shared hit=188052, temp read=13759 written=13778
- Worker 17: actual time=11897.939..12391.175 rows=1755948 loops=1
- Buffers: shared hit=154068, temp read=11273 written=11288
- Worker 18: actual time=12171.070..12661.251 rows=1755944 loops=1
- Buffers: shared hit=154191, temp read=11273 written=11288
- Worker 19: actual time=11934.165..12549.351 rows=2225151 loops=1
- Buffers: shared hit=194987, temp read=14285 written=14305
- Worker 20: actual time=12144.452..12632.096 rows=1755935 loops=1
- Buffers: shared hit=153932, temp read=11273 written=11288
- Worker 21: actual time=12188.108..12679.023 rows=1755934 loops=1
- Buffers: shared hit=154158, temp read=11273 written=11288
- Worker 22: actual time=12081.380..12710.648 rows=2251317 loops=1
- Buffers: shared hit=197454, temp read=14452 written=14472
- Worker 23: actual time=11902.854..12362.911 rows=1656889 loops=1
- Buffers: shared hit=145466, temp read=10637 written=10652
- -> Parallel Seq Scan on public.test (cost=0.00..4541668.12 rows=51440 width=200) (actual time=0.026..4627.345 rows=1900091 loops=25)
- Output: c33, c64, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56
- Filter: ((((test.c64)::double precision ^ '2'::double precision) > '0'::double precision) AND (sqrt((test.c64)::double precision) > '0'::double precision) AND (hashtext(test.c1) > hashtext(test.c2)) AND (pow(sqrt((abs(hashint8(CASE test.c38 WHEN 0 THEN test.c39 WHEN 1 THEN test.c40 WHEN 2 THEN test.c41 WHEN 3 THEN test.c42 WHEN 4 THEN test.c43 WHEN 5 THEN test.c44 WHEN 6 THEN test.c45 WHEN 7 THEN test.c46 WHEN 8 THEN test.c47 WHEN 9 THEN test.c48 WHEN 10 THEN test.c49 WHEN 11 THEN test.c50 WHEN 12 THEN test.c51 WHEN 13 THEN test.c52 WHEN 14 THEN test.c53 WHEN 15 THEN test.c54 WHEN 16 THEN test.c55 ELSE test.c56 END)))::double precision), '2'::double precision) > '0'::double precision))
- Rows Removed by Filter: 2099909
- Buffers: shared hit=4166668
- Worker 0: actual time=0.024..4041.154 rows=1676370 loops=1
- Buffers: shared hit=147020
- Worker 1: actual time=0.026..4232.052 rows=1755946 loops=1
- Buffers: shared hit=153966
- Worker 2: actual time=0.025..4275.863 rows=1755945 loops=1
- Buffers: shared hit=153999
- Worker 3: actual time=0.024..4268.778 rows=1755935 loops=1
- Buffers: shared hit=153995
- Worker 4: actual time=0.025..5517.396 rows=2251992 loops=1
- Buffers: shared hit=197496
- Worker 5: actual time=0.024..4173.081 rows=1718516 loops=1
- Buffers: shared hit=150689
- Worker 6: actual time=0.025..4315.820 rows=1755948 loops=1
- Buffers: shared hit=154057
- Worker 7: actual time=0.028..4271.339 rows=1755947 loops=1
- Buffers: shared hit=153816
- Worker 8: actual time=0.025..4270.376 rows=1755936 loops=1
- Buffers: shared hit=153982
- Worker 9: actual time=0.032..4301.026 rows=1755941 loops=1
- Buffers: shared hit=154107
- Worker 10: actual time=0.025..5193.043 rows=2145118 loops=1
- Buffers: shared hit=188179
- Worker 11: actual time=0.024..5393.193 rows=2246436 loops=1
- Buffers: shared hit=197232
- Worker 12: actual time=0.026..4288.686 rows=1755944 loops=1
- Buffers: shared hit=153937
- Worker 13: actual time=0.026..5177.502 rows=2145124 loops=1
- Buffers: shared hit=188155
- Worker 14: actual time=0.031..4098.000 rows=1646436 loops=1
- Buffers: shared hit=144479
- Worker 15: actual time=0.023..5155.199 rows=2124740 loops=1
- Buffers: shared hit=186355
- Worker 16: actual time=0.024..5205.816 rows=2143293 loops=1
- Buffers: shared hit=188030
- Worker 17: actual time=0.024..4334.653 rows=1755948 loops=1
- Buffers: shared hit=154046
- Worker 18: actual time=0.030..4290.928 rows=1755944 loops=1
- Buffers: shared hit=154169
- Worker 19: actual time=0.025..5447.905 rows=2225151 loops=1
- Buffers: shared hit=194965
- Worker 20: actual time=0.034..4296.168 rows=1755935 loops=1
- Buffers: shared hit=153910
- Worker 21: actual time=0.025..4273.822 rows=1755934 loops=1
- Buffers: shared hit=154136
- Worker 22: actual time=0.026..5437.385 rows=2251317 loops=1
- Buffers: shared hit=197432
- Worker 23: actual time=0.027..4039.679 rows=1656889 loops=1
- Buffers: shared hit=145444
- Planning Time: 0.230 ms
- Execution Time: 14383.790 ms
- (193 rows)
非并行
- set max_parallel_workers_per_gather =0;
-
- HashAggregate (cost=13253090.64..13253093.36 rows=121 width=200) (actual time=135995.544..135996.426 rows=110 loops=1)
- Output: c33, c64, count(*), avg(c33), min(c34), max(c35), stddev(c36), sum(c37), stddev(GREATEST(c37, c38)), sum(CASE c38 WHEN 0 THEN c39 WHEN 1 THEN c40 WHEN 2 THEN c41 WHEN 3 THEN c42 WHEN 4 THEN c43 WHEN 5 THEN c44 WHEN 6 THEN c45 WHEN 7 THEN c46 WHEN 8 THEN c47 WHEN 9 THEN c48 WHEN 10 THEN c49 WHEN 11 THEN c50 WHEN 12 THEN c51 WHEN 13 THEN c52 WHEN 14 THEN c53 WHEN 15 THEN c54 WHEN 16 THEN c55 ELSE c56 END)
- Group Key: test.c33, test.c64
- Buffers: shared hit=4166668
- -> Seq Scan on public.test (cost=0.00..13166670.88 rows=1234568 width=200) (actual time=0.026..101059.780 rows=47502283 loops=1)
- Output: id, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c21, c22, c23, c24, c25, c26, c27, c28, c29, c30, c31, c32, c33, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56, c57, c58, c59, c60, c61, c62, c63, c64
- Filter: ((((test.c64)::double precision ^ '2'::double precision) > '0'::double precision) AND (sqrt((test.c64)::double precision) > '0'::double precision) AND (hashtext(test.c1) > hashtext(test.c2)) AND (pow(sqrt((abs(hashint8(CASE test.c38 WHEN 0 THEN test.c39 WHEN 1 THEN test.c40 WHEN 2 THEN test.c41 WHEN 3 THEN test.c42 WHEN 4 THEN test.c43 WHEN 5 THEN test.c44 WHEN 6 THEN test.c45 WHEN 7 THEN test.c46 WHEN 8 THEN test.c47 WHEN 9 THEN test.c48 WHEN 10 THEN test.c49 WHEN 11 THEN test.c50 WHEN 12 THEN test.c51 WHEN 13 THEN test.c52 WHEN 14 THEN test.c53 WHEN 15 THEN test.c54 WHEN 16 THEN test.c55 ELSE test.c56 END)))::double precision), '2'::double precision) > '0'::double precision))
- Rows Removed by Filter: 52497717
- Buffers: shared hit=4166668
- Planning Time: 0.202 ms
- Execution Time: 135996.540 ms
- (11 rows)
4、少量表达式性能对比
- explain select count(*) from test;
-
- QUERY PLAN
- -------------------------------------------------------------------------------------------
- Finalize Aggregate (cost=4218751.42..4218751.43 rows=1 width=8)
- -> Gather (cost=4218751.35..4218751.36 rows=24 width=8)
- Workers Planned: 24
- -> Partial Aggregate (cost=4218751.35..4218751.36 rows=1 width=8)
- -> Parallel Seq Scan on test (cost=0.00..4208334.68 rows=4166668 width=0)
- JIT:
- Functions: 5
- Inlining: true
- Optimization: true
- (9 rows)
4.1、开启JIT
set jit=on;
并行
- set max_parallel_workers_per_gather =24;
-
- Finalize Aggregate (cost=4218751.42..4218751.43 rows=1 width=8) (actual time=1195.630..1195.630 rows=1 loops=1)
- Output: count(*)
- Buffers: shared hit=171976
- -> Gather (cost=4218751.35..4218751.36 rows=24 width=8) (actual time=1195.599..1195.616 rows=25 loops=1)
- Output: (PARTIAL count(*))
- Workers Planned: 24
- Workers Launched: 24
- Buffers: shared hit=171976
- -> Partial Aggregate (cost=4218751.35..4218751.36 rows=1 width=8) (actual time=1172.856..1172.856 rows=1 loops=25)
- Output: PARTIAL count(*)
- Buffers: shared hit=4166668
- Worker 0: actual time=1171.634..1171.634 rows=1 loops=1
- Buffers: shared hit=166655
- Worker 1: actual time=1171.631..1171.631 rows=1 loops=1
- Buffers: shared hit=166878
- Worker 2: actual time=1171.533..1171.533 rows=1 loops=1
- Buffers: shared hit=166588
- Worker 3: actual time=1171.634..1171.634 rows=1 loops=1
- Buffers: shared hit=166659
- Worker 4: actual time=1172.296..1172.296 rows=1 loops=1
- Buffers: shared hit=165127
- Worker 5: actual time=1171.637..1171.637 rows=1 loops=1
- Buffers: shared hit=166191
- Worker 6: actual time=1172.351..1172.351 rows=1 loops=1
- Buffers: shared hit=166251
- Worker 7: actual time=1171.634..1171.634 rows=1 loops=1
- Buffers: shared hit=166945
- Worker 8: actual time=1172.317..1172.317 rows=1 loops=1
- Buffers: shared hit=166515
- Worker 9: actual time=1171.632..1171.632 rows=1 loops=1
- Buffers: shared hit=166900
- Worker 10: actual time=1172.379..1172.379 rows=1 loops=1
- Buffers: shared hit=167330
- Worker 11: actual time=1171.555..1171.555 rows=1 loops=1
- Buffers: shared hit=165973
- Worker 12: actual time=1172.369..1172.369 rows=1 loops=1
- Buffers: shared hit=167326
- Worker 13: actual time=1172.361..1172.361 rows=1 loops=1
- Buffers: shared hit=167240
- Worker 14: actual time=1171.603..1171.603 rows=1 loops=1
- Buffers: shared hit=165948
- Worker 15: actual time=1172.308..1172.308 rows=1 loops=1
- Buffers: shared hit=167271
- Worker 16: actual time=1171.627..1171.627 rows=1 loops=1
- Buffers: shared hit=166118
- Worker 17: actual time=1172.363..1172.363 rows=1 loops=1
- Buffers: shared hit=166540
- Worker 18: actual time=1171.627..1171.628 rows=1 loops=1
- Buffers: shared hit=161783
- Worker 19: actual time=1172.367..1172.367 rows=1 loops=1
- Buffers: shared hit=167241
- Worker 20: actual time=1171.622..1171.622 rows=1 loops=1
- Buffers: shared hit=166895
- Worker 21: actual time=1172.310..1172.311 rows=1 loops=1
- Buffers: shared hit=166291
- Worker 22: actual time=1171.634..1171.634 rows=1 loops=1
- Buffers: shared hit=166817
- Worker 23: actual time=1172.357..1172.357 rows=1 loops=1
- Buffers: shared hit=167210
- -> Parallel Seq Scan on public.test (cost=0.00..4208334.68 rows=4166668 width=0) (actual time=0.005..756.021 rows=4000000 loops=25)
- Output: id, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c21, c22, c23, c24, c25, c26, c27, c28, c29, c30, c31, c32, c33, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56, c57, c58, c59, c60, c61, c62, c63, c64
- Buffers: shared hit=4166668
- Worker 0: actual time=0.005..752.685 rows=3999720 loops=1
- Buffers: shared hit=166655
- Worker 1: actual time=0.005..753.934 rows=4005072 loops=1
- Buffers: shared hit=166878
- Worker 2: actual time=0.006..752.288 rows=3998112 loops=1
- Buffers: shared hit=166588
- Worker 3: actual time=0.005..749.831 rows=3999816 loops=1
- Buffers: shared hit=166659
- Worker 4: actual time=0.005..754.798 rows=3963048 loops=1
- Buffers: shared hit=165127
- Worker 5: actual time=0.006..753.153 rows=3988584 loops=1
- Buffers: shared hit=166191
- Worker 6: actual time=0.005..755.385 rows=3990024 loops=1
- Buffers: shared hit=166251
- Worker 7: actual time=0.005..754.636 rows=4006667 loops=1
- Buffers: shared hit=166945
- Worker 8: actual time=0.005..757.278 rows=3996360 loops=1
- Buffers: shared hit=166515
- Worker 9: actual time=0.005..752.936 rows=4005600 loops=1
- Buffers: shared hit=166900
- Worker 10: actual time=0.006..755.266 rows=4015920 loops=1
- Buffers: shared hit=167330
- Worker 11: actual time=0.005..755.747 rows=3983352 loops=1
- Buffers: shared hit=165973
- Worker 12: actual time=0.005..755.743 rows=4015824 loops=1
- Buffers: shared hit=167326
- Worker 13: actual time=0.005..756.052 rows=4013742 loops=1
- Buffers: shared hit=167240
- Worker 14: actual time=0.006..755.470 rows=3982752 loops=1
- Buffers: shared hit=165948
- Worker 15: actual time=0.005..755.769 rows=4014504 loops=1
- Buffers: shared hit=167271
- Worker 16: actual time=0.006..755.859 rows=3986832 loops=1
- Buffers: shared hit=166118
- Worker 17: actual time=0.005..757.579 rows=3996959 loops=1
- Buffers: shared hit=166540
- Worker 18: actual time=0.005..756.139 rows=3882792 loops=1
- Buffers: shared hit=161783
- Worker 19: actual time=0.005..755.706 rows=4013784 loops=1
- Buffers: shared hit=167241
- Worker 20: actual time=0.005..753.655 rows=4005480 loops=1
- Buffers: shared hit=166895
- Worker 21: actual time=0.005..757.511 rows=3990984 loops=1
- Buffers: shared hit=166291
- Worker 22: actual time=0.005..753.577 rows=4003608 loops=1
- Buffers: shared hit=166817
- Worker 23: actual time=0.005..755.443 rows=4013040 loops=1
- Buffers: shared hit=167210
- Planning Time: 0.192 ms
- JIT:
- Functions: 5
- Generation Time: 1.082 ms
- Inlining: true
- Inlining Time: 2.907 ms
- Optimization: true
- Optimization Time: 23.873 ms
- Emission Time: 12.577 ms
- Execution Time: 1211.337 ms
- (120 rows)
非并行
- set max_parallel_workers_per_gather =0;
-
- Aggregate (cost=5416668.40..5416668.41 rows=1 width=8) (actual time=26222.597..26222.598 rows=1 loops=1)
- Output: count(*)
- Buffers: shared hit=4166668
- -> Seq Scan on public.test (cost=0.00..5166668.32 rows=100000032 width=0) (actual time=0.016..17338.548 rows=100000000 loops=1)
- Output: id, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c21, c22, c23, c24, c25, c26, c27, c28, c29, c30, c31, c32, c33, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56, c57, c58, c59, c60, c61, c62, c63, c64
- Buffers: shared hit=4166668
- Planning Time: 0.056 ms
- JIT:
- Functions: 2
- Generation Time: 0.457 ms
- Inlining: true
- Inlining Time: 1.355 ms
- Optimization: true
- Optimization Time: 8.269 ms
- Emission Time: 5.354 ms
- Execution Time: 26223.178 ms
- (16 rows)
4.2、关闭JIT
set jit=off;
并行
- set max_parallel_workers_per_gather =24;
-
- Finalize Aggregate (cost=4218751.42..4218751.43 rows=1 width=8) (actual time=1166.951..1166.951 rows=1 loops=1)
- Output: count(*)
- Buffers: shared hit=170684
- -> Gather (cost=4218751.35..4218751.36 rows=24 width=8) (actual time=1166.923..1166.939 rows=25 loops=1)
- Output: (PARTIAL count(*))
- Workers Planned: 24
- Workers Launched: 24
- Buffers: shared hit=170684
- -> Partial Aggregate (cost=4218751.35..4218751.36 rows=1 width=8) (actual time=1146.387..1146.388 rows=1 loops=25)
- Output: PARTIAL count(*)
- Buffers: shared hit=4166668
- Worker 0: actual time=1145.584..1145.584 rows=1 loops=1
- Buffers: shared hit=167586
- Worker 1: actual time=1145.605..1145.605 rows=1 loops=1
- Buffers: shared hit=167235
- Worker 2: actual time=1145.517..1145.517 rows=1 loops=1
- Buffers: shared hit=164055
- Worker 3: actual time=1145.537..1145.537 rows=1 loops=1
- Buffers: shared hit=166048
- Worker 4: actual time=1145.625..1145.626 rows=1 loops=1
- Buffers: shared hit=165348
- Worker 5: actual time=1145.621..1145.621 rows=1 loops=1
- Buffers: shared hit=167426
- Worker 6: actual time=1145.746..1145.746 rows=1 loops=1
- Buffers: shared hit=166945
- Worker 7: actual time=1145.707..1145.707 rows=1 loops=1
- Buffers: shared hit=166628
- Worker 8: actual time=1143.795..1143.796 rows=1 loops=1
- Buffers: shared hit=163445
- Worker 9: actual time=1145.555..1145.556 rows=1 loops=1
- Buffers: shared hit=166960
- Worker 10: actual time=1145.605..1145.605 rows=1 loops=1
- Buffers: shared hit=166142
- Worker 11: actual time=1145.633..1145.633 rows=1 loops=1
- Buffers: shared hit=166090
- Worker 12: actual time=1145.553..1145.553 rows=1 loops=1
- Buffers: shared hit=168115
- Worker 13: actual time=1145.819..1145.819 rows=1 loops=1
- Buffers: shared hit=167973
- Worker 14: actual time=1145.615..1145.615 rows=1 loops=1
- Buffers: shared hit=166419
- Worker 15: actual time=1145.558..1145.558 rows=1 loops=1
- Buffers: shared hit=168061
- Worker 16: actual time=1145.527..1145.527 rows=1 loops=1
- Buffers: shared hit=166438
- Worker 17: actual time=1146.062..1146.062 rows=1 loops=1
- Buffers: shared hit=166993
- Worker 18: actual time=1145.629..1145.629 rows=1 loops=1
- Buffers: shared hit=162480
- Worker 19: actual time=1146.053..1146.054 rows=1 loops=1
- Buffers: shared hit=167989
- Worker 20: actual time=1144.864..1144.864 rows=1 loops=1
- Buffers: shared hit=164403
- Worker 21: actual time=1145.754..1145.754 rows=1 loops=1
- Buffers: shared hit=167957
- Worker 22: actual time=1145.625..1145.625 rows=1 loops=1
- Buffers: shared hit=167457
- Worker 23: actual time=1146.064..1146.064 rows=1 loops=1
- Buffers: shared hit=167791
- -> Parallel Seq Scan on public.test (cost=0.00..4208334.68 rows=4166668 width=0) (actual time=0.005..756.396 rows=4000000 loops=25)
- Output: id, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c21, c22, c23, c24, c25, c26, c27, c28, c29, c30, c31, c32, c33, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56, c57, c58, c59, c60, c61, c62, c63, c64
- Buffers: shared hit=4166668
- Worker 0: actual time=0.005..756.202 rows=4022064 loops=1
- Buffers: shared hit=167586
- Worker 1: actual time=0.005..755.320 rows=4013627 loops=1
- Buffers: shared hit=167235
- Worker 2: actual time=0.005..744.106 rows=3937320 loops=1
- Buffers: shared hit=164055
- Worker 3: actual time=0.005..751.125 rows=3985152 loops=1
- Buffers: shared hit=166048
- Worker 4: actual time=0.005..753.618 rows=3968352 loops=1
- Buffers: shared hit=165348
- Worker 5: actual time=0.005..756.612 rows=4018224 loops=1
- Buffers: shared hit=167426
- Worker 6: actual time=0.004..758.326 rows=4006680 loops=1
- Buffers: shared hit=166945
- Worker 7: actual time=0.005..759.780 rows=3999071 loops=1
- Buffers: shared hit=166628
- Worker 8: actual time=0.006..756.676 rows=3922680 loops=1
- Buffers: shared hit=163445
- Worker 9: actual time=0.005..751.844 rows=4007040 loops=1
- Buffers: shared hit=166960
- Worker 10: actual time=0.005..749.207 rows=3987408 loops=1
- Buffers: shared hit=166142
- Worker 11: actual time=0.006..757.084 rows=3986160 loops=1
- Buffers: shared hit=166090
- Worker 12: actual time=0.004..756.443 rows=4034760 loops=1
- Buffers: shared hit=168115
- Worker 13: actual time=0.005..756.966 rows=4031352 loops=1
- Buffers: shared hit=167973
- Worker 14: actual time=0.005..759.423 rows=3994056 loops=1
- Buffers: shared hit=166419
- Worker 15: actual time=0.004..756.996 rows=4033464 loops=1
- Buffers: shared hit=168061
- Worker 16: actual time=0.006..759.134 rows=3994512 loops=1
- Buffers: shared hit=166438
- Worker 17: actual time=0.006..758.454 rows=4007814 loops=1
- Buffers: shared hit=166993
- Worker 18: actual time=0.005..758.647 rows=3899520 loops=1
- Buffers: shared hit=162480
- Worker 19: actual time=0.005..756.724 rows=4031736 loops=1
- Buffers: shared hit=167989
- Worker 20: actual time=0.006..755.985 rows=3945672 loops=1
- Buffers: shared hit=164403
- Worker 21: actual time=0.004..755.998 rows=4030968 loops=1
- Buffers: shared hit=167957
- Worker 22: actual time=0.006..757.390 rows=4018968 loops=1
- Buffers: shared hit=167457
- Worker 23: actual time=0.006..756.996 rows=4026984 loops=1
- Buffers: shared hit=167791
- Planning Time: 0.070 ms
- Execution Time: 1172.569 ms
- (112 rows)
非并行
- set max_parallel_workers_per_gather =0;
-
- Aggregate (cost=5416668.40..5416668.41 rows=1 width=8) (actual time=26976.475..26976.476 rows=1 loops=1)
- Output: count(*)
- Buffers: shared hit=4166668
- -> Seq Scan on public.test (cost=0.00..5166668.32 rows=100000032 width=0) (actual time=0.011..17349.593 rows=100000000 loops=1)
- Output: id, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c21, c22, c23, c24, c25, c26, c27, c28, c29, c30, c31, c32, c33, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56, c57, c58, c59, c60, c61, c62, c63, c64
- Buffers: shared hit=4166668
- Planning Time: 0.053 ms
- Execution Time: 26976.515 ms
- (8 rows)
目前观察到bitcode的代码是每个worker都会产生一份,这个在实现bitcode cache后应该也是可以共用一份的。
- 开启bc代码的DUMP
-
- set jit_dump_bitcode =on;
-
- 注意清理,bc文件不会自动清理。
- cd $PGDATA
-
- -rw------- 1 digoal digoal 126K Apr 4 12:59 34026.0.optimized.bc
- -rw------- 1 digoal digoal 126K Apr 4 12:59 34021.0.optimized.bc
- -rw------- 1 digoal digoal 126K Apr 4 12:59 34019.0.optimized.bc
- -rw------- 1 digoal digoal 126K Apr 4 12:59 34024.0.optimized.bc
- -rw------- 1 digoal digoal 126K Apr 4 12:59 34023.0.optimized.bc
- -rw------- 1 digoal digoal 126K Apr 4 12:59 34017.0.optimized.bc
- -rw------- 1 digoal digoal 126K Apr 4 12:59 34015.0.optimized.bc
- -rw------- 1 digoal digoal 126K Apr 4 12:59 34013.0.optimized.bc
- -rw------- 1 digoal digoal 126K Apr 4 12:59 34020.0.optimized.bc
- -rw------- 1 digoal digoal 126K Apr 4 12:59 34016.0.optimized.bc
- -rw------- 1 digoal digoal 126K Apr 4 12:59 34033.0.optimized.bc
- -rw------- 1 digoal digoal 126K Apr 4 12:59 34031.0.optimized.bc
- -rw------- 1 digoal digoal 126K Apr 4 12:59 34018.0.optimized.bc
- -rw------- 1 digoal digoal 126K Apr 4 12:59 34014.0.optimized.bc
- -rw------- 1 digoal digoal 126K Apr 4 12:59 34036.0.optimized.bc
- -rw------- 1 digoal digoal 126K Apr 4 12:59 34035.0.optimized.bc
- -rw------- 1 digoal digoal 126K Apr 4 12:59 34030.0.optimized.bc
- -rw------- 1 digoal digoal 126K Apr 4 12:59 34028.0.optimized.bc
- -rw------- 1 digoal digoal 126K Apr 4 12:59 34032.0.optimized.bc
- -rw------- 1 digoal digoal 126K Apr 4 12:59 34029.0.optimized.bc
- -rw------- 1 digoal digoal 126K Apr 4 12:59 34022.0.optimized.bc
- -rw------- 1 digoal digoal 126K Apr 4 12:59 34025.0.optimized.bc
- -rw------- 1 digoal digoal 126K Apr 4 12:59 34027.0.optimized.bc
使用llvm-bcanalyzer可以查看bitcode的内容
https://www.llvm.org/docs/CommandGuide/llvm-bcanalyzer.html
- /usr/local/llvm6.0.0/bin/llvm-bcanalyzer ./40818.7.bc
-
- Summary of ./40818.7.bc:
- Total size: 1069440b/133680.00B/33420W
- Stream type: LLVM IR
- # Toplevel Blocks: 4
-
- Per-block Summary:
- Block ID #0 (BLOCKINFO_BLOCK):
- Num Instances: 1
- Total Size: 672b/84.00B/21W
- Percent of file: 0.0628%
- Num SubBlocks: 0
- Num Abbrevs: 16
- Num Records: 3
- Percent Abbrevs: 0.0000%
-
- Record Histogram:
- Count # Bits b/Rec % Abv Record Kind
- 3 60 20.0 SETBID
-
- Block ID #8 (MODULE_BLOCK):
- Num Instances: 1
- Total Size: 18530b/2316.25B/579W
- Percent of file: 1.7327%
- Num SubBlocks: 67
- Num Abbrevs: 3
- Num Records: 129
- Percent Abbrevs: 2.3256%
-
- Record Histogram:
- Count # Bits b/Rec % Abv Record Kind
- 86 11904 138.4 FUNCTION
- 38 4813 126.7 2.63 GLOBALVAR
- 1 21 100.00 SOURCE_FILENAME
- 1 35 100.00 VSTOFFSET
- 1 465 DATALAYOUT
- 1 303 TRIPLE
- 1 21 VERSION
- ........
http://blog.jobbole.com/113684/
https://www.postgresql.org/docs/devel/static/jit-extensibility.html
https://www.postgresql.org/docs/devel/static/runtime-config-client.html#GUC-JIT-PROVIDER
https://www.postgresql.org/docs/devel/static/runtime-config-query.html#GUC-JIT-ABOVE-COST
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。