当前位置:   article > 正文

PostgreSQL 11 preview - with_llvm JIT支持部署与试用

llvmjit.so

标签

PostgreSQL , 操作符 , 计算 , deform , LLVM , JIT


背景

PostgreSQL 11 JIT,目前支持tuple deform(将磁盘上的tuple转换为内存中TUPLE格式),以及表达式(select, where, 等语义中的表达式,操作符运算,UDF等)的动态编译。

对海量数据的计算(并且表达式或TUPLE DEFORM已成为瓶颈时)有加速效果。

特别适合OLAP场景的大量数据的复杂计算。

《分析加速引擎黑科技 - LLVM、列存、多核并行、算子复用 大联姻 - 一起来开启PostgreSQL的百宝箱》

部署PostgreSQL 11 with LLVM

参考

《PostgreSQL 10 + PostGIS + Sharding(pg_pathman) + MySQL(fdw外部表) on ECS 部署指南(适合新用户)》

以CentOS 7.x x64为例

1、install epel

  1. wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
  2. rpm -ivh epel-release-latest-7.noarch.rpm

2、

  1. yum install -y cmake3
  2. yum install -y python

3、LLVM

  1. wget http://releases.llvm.org/6.0.0/llvm-6.0.0.src.tar.xz
  2. tar -xvf llvm-6.0.0.src.tar.xz

4、CLANG

  1. wget http://releases.llvm.org/6.0.0/cfe-6.0.0.src.tar.xz
  2. tar -xvf cfe-6.0.0.src.tar.xz
  3. mv cfe-6.0.0.src clang
  4. mv clang llvm-6.0.0.src/tools

5、安装支持JIT的编译器llvm和clang

  1. cd llvm-6.0.0.src
  2. mkdir mybuilddir
  3. cd mybuilddir
  4. alias cmake=cmake3
  5. export CXXFLAGS=-O2
  6. cmake ..
  7. cmake --build .
  8. cmake -DCMAKE_INSTALL_PREFIX=/usr/local/llvm6.0.0 -P cmake_install.cmake

6、编译PG with llvm

  1. wget https://ftp.postgresql.org/pub/snapshot/dev/postgresql-snapshot.tar.bz2
  2. tar -jxvf postgresql-snapshot.tar.bz2
  3. cd postgresql-11devel
  1. CLANG=/usr/local/llvm6.0.0/bin/clang LLVM_CONFIG=/usr/local/llvm6.0.0/bin/llvm-config ./configure --prefix=/home/digoal/pgsql11_llvm --with-llvm --with-blocksize=32 --with-segsize=2
  2. make world -j 128
  3. make install-world

7、pg with llvm的目录结构

bit code (二进制码)

  1. digoal@iZbp13nu0s9j3x3op4zpd4Z-> pwd
  2. /home/digoal/pgsql11_llvm/lib
  3. digoal@iZbp13nu0s9j3x3op4zpd4Z-> ll llvmjit*
  4. -rwxr-xr-x 1 root root 709M Apr 4 06:58 llvmjit.so
  5. -rw-r--r-- 1 root root 7.8K Apr 4 06:58 llvmjit_types.bc
  6. digoal@iZbp13nu0s9j3x3op4zpd4Z-> pwd
  7. /home/digoal/pgsql11_llvm/lib/bitcode
  8. digoal@iZbp13nu0s9j3x3op4zpd4Z-> ll
  9. total 1.8M
  10. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 adminpack
  11. -rw-r--r-- 1 root root 1.4K Apr 4 06:58 adminpack.index.bc
  12. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 amcheck
  13. -rw-r--r-- 1 root root 2.3K Apr 4 06:58 amcheck.index.bc
  14. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 auth_delay
  15. -rw-r--r-- 1 root root 376 Apr 4 06:58 auth_delay.index.bc
  16. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 auto_explain
  17. -rw-r--r-- 1 root root 1.7K Apr 4 06:58 auto_explain.index.bc
  18. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 autoinc
  19. -rw-r--r-- 1 root root 528 Apr 4 06:58 autoinc.index.bc
  20. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 bloom
  21. -rw-r--r-- 1 root root 1.8K Apr 4 06:58 bloom.index.bc
  22. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 btree_gin
  23. -rw-r--r-- 1 root root 5.8K Apr 4 06:58 btree_gin.index.bc
  24. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 btree_gist
  25. -rw-r--r-- 1 root root 20K Apr 4 06:58 btree_gist.index.bc
  26. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 citext
  27. -rw-r--r-- 1 root root 1.3K Apr 4 06:58 citext.index.bc
  28. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 cube
  29. -rw-r--r-- 1 root root 6.4K Apr 4 06:58 cube.index.bc
  30. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 dblink
  31. -rw-r--r-- 1 root root 5.6K Apr 4 06:58 dblink.index.bc
  32. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 dict_int
  33. -rw-r--r-- 1 root root 480 Apr 4 06:58 dict_int.index.bc
  34. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 dict_xsyn
  35. -rw-r--r-- 1 root root 656 Apr 4 06:58 dict_xsyn.index.bc
  36. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 earthdistance
  37. -rw-r--r-- 1 root root 280 Apr 4 06:58 earthdistance.index.bc
  38. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 file_fdw
  39. -rw-r--r-- 1 root root 1.8K Apr 4 06:58 file_fdw.index.bc
  40. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 fuzzystrmatch
  41. -rw-r--r-- 1 root root 5.7K Apr 4 06:58 fuzzystrmatch.index.bc
  42. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 hstore
  43. -rw-r--r-- 1 root root 6.9K Apr 4 06:58 hstore.index.bc
  44. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 insert_username
  45. -rw-r--r-- 1 root root 544 Apr 4 06:58 insert_username.index.bc
  46. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 _int
  47. -rw-r--r-- 1 root root 6.9K Apr 4 06:58 _int.index.bc
  48. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 isn
  49. -rw-r--r-- 1 root root 64K Apr 4 06:58 isn.index.bc
  50. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 lo
  51. -rw-r--r-- 1 root root 432 Apr 4 06:58 lo.index.bc
  52. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 ltree
  53. -rw-r--r-- 1 root root 7.4K Apr 4 06:58 ltree.index.bc
  54. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 moddatetime
  55. -rw-r--r-- 1 root root 592 Apr 4 06:58 moddatetime.index.bc
  56. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 pageinspect
  57. -rw-r--r-- 1 root root 6.3K Apr 4 06:58 pageinspect.index.bc
  58. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 passwordcheck
  59. -rw-r--r-- 1 root root 436 Apr 4 06:58 passwordcheck.index.bc
  60. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 pg_buffercache
  61. -rw-r--r-- 1 root root 660 Apr 4 06:58 pg_buffercache.index.bc
  62. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 pgcrypto
  63. -rw-r--r-- 1 root root 23K Apr 4 06:58 pgcrypto.index.bc
  64. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 pg_freespacemap
  65. -rw-r--r-- 1 root root 372 Apr 4 06:58 pg_freespacemap.index.bc
  66. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 pg_prewarm
  67. -rw-r--r-- 1 root root 2.6K Apr 4 06:58 pg_prewarm.index.bc
  68. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 pgrowlocks
  69. -rw-r--r-- 1 root root 968 Apr 4 06:58 pgrowlocks.index.bc
  70. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 pg_stat_statements
  71. -rw-r--r-- 1 root root 3.1K Apr 4 06:58 pg_stat_statements.index.bc
  72. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 pgstattuple
  73. -rw-r--r-- 1 root root 3.3K Apr 4 06:58 pgstattuple.index.bc
  74. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 pg_trgm
  75. -rw-r--r-- 1 root root 4.1K Apr 4 06:58 pg_trgm.index.bc
  76. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 pg_visibility
  77. -rw-r--r-- 1 root root 1.1K Apr 4 06:58 pg_visibility.index.bc
  78. drwxr-xr-x 25 root root 4.0K Apr 4 06:58 postgres
  79. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 postgres_fdw
  80. -rw-r--r-- 1 root root 12K Apr 4 06:58 postgres_fdw.index.bc
  81. -rw-r--r-- 1 root root 1.3M Apr 4 06:58 postgres.index.bc
  82. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 refint
  83. -rw-r--r-- 1 root root 1.9K Apr 4 06:58 refint.index.bc
  84. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 seg
  85. -rw-r--r-- 1 root root 5.2K Apr 4 06:58 seg.index.bc
  86. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 tablefunc
  87. -rw-r--r-- 1 root root 2.1K Apr 4 06:58 tablefunc.index.bc
  88. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 tcn
  89. -rw-r--r-- 1 root root 584 Apr 4 06:58 tcn.index.bc
  90. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 test_decoding
  91. -rw-r--r-- 1 root root 1.5K Apr 4 06:58 test_decoding.index.bc
  92. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 timetravel
  93. -rw-r--r-- 1 root root 1.2K Apr 4 06:58 timetravel.index.bc
  94. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 tsm_system_rows
  95. -rw-r--r-- 1 root root 524 Apr 4 06:58 tsm_system_rows.index.bc
  96. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 tsm_system_time
  97. -rw-r--r-- 1 root root 524 Apr 4 06:58 tsm_system_time.index.bc
  98. drwxr-xr-x 2 root root 4.0K Apr 4 06:58 unaccent
  99. -rw-r--r-- 1 root root 844 Apr 4 06:58 unaccent.index.bc
  100. digoal@iZbp13nu0s9j3x3op4zpd4Z-> cd unaccent
  101. digoal@iZbp13nu0s9j3x3op4zpd4Z-> ll
  102. total 16K
  103. -rw-r--r-- 1 root root 15K Apr 4 06:58 unaccent.bc

7.1、使用llvm-bcanalyzer观察bc的内容

  1. cd /home/digoal/pgsql11_llvm/lib/bitcode/postgres/access/heap
  2. /usr/local/llvm6.0.0/bin/llvm-bcanalyzer ./heapam.bc
  3. Summary of ./heapam.bc:
  4. Total size: 1459456b/182432.00B/45608W
  5. Stream type: LLVM IR
  6. # Toplevel Blocks: 4
  7. Per-block Summary:
  8. Block ID #0 (BLOCKINFO_BLOCK):
  9. Num Instances: 1
  10. Total Size: 672b/84.00B/21W
  11. Percent of file: 0.0460%
  12. Num SubBlocks: 0
  13. Num Abbrevs: 16
  14. Num Records: 3
  15. Percent Abbrevs: 0.0000%
  16. Record Histogram:
  17. Count # Bits b/Rec % Abv Record Kind
  18. 3 60 20.0 SETBID
  19. Block ID #8 (MODULE_BLOCK):
  20. Num Instances: 1
  21. Total Size: 44789b/5598.62B/1399W
  22. Percent of file: 3.0689%
  23. Num SubBlocks: 79
  24. Num Abbrevs: 3
  25. Num Records: 317
  26. Percent Abbrevs: 1.5773%
  27. Record Histogram:
  28. Count # Bits b/Rec % Abv Record Kind
  29. 222 31488 141.8 FUNCTION
  30. 89 11128 125.0 3.37 GLOBALVAR
  31. 1 207 HASH
  32. 1 57 100.00 SOURCE_FILENAME
  33. 1 35 100.00 VSTOFFSET
  34. 1 465 DATALAYOUT
  35. 1 303 TRIPLE
  36. 1 21 VERSION
  37. ......

8、设置环境变量

  1. su - digoal
  2. vi env_pg11_llvm.sh
  3. export PS1="$USER@`/bin/hostname -s`-> "
  4. export PGPORT=4000
  5. export PGDATA=/data01/pg/pg_root$PGPORT
  6. export LANG=en_US.utf8
  7. export PGHOME=/home/digoal/pgsql11_llvm
  8. export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH
  9. export DATE=`date +"%Y%m%d%H%M"`
  10. export PATH=$PGHOME/bin:$PATH:.
  11. export MANPATH=$PGHOME/share/man:$MANPATH
  12. export PGHOST=$PGDATA
  13. export PGUSER=postgres
  14. export PGDATABASE=postgres
  15. alias rm='rm -i'
  16. alias ll='ls -lh'
  17. unalias vi
  18. . ./env_pg11_llvm.sh

9、初始化数据库

initdb -D $PGDATA -U postgres -E UTF8 --locale=en_US.UTF8 -X /data02/pg/pg_wal_$PGPORT  

10、配置postgresql.conf

  1. listen_addresses = '0.0.0.0'
  2. port = 4000
  3. max_connections = 3000
  4. unix_socket_directories = '/tmp,.'
  5. unix_socket_permissions = 0700
  6. tcp_keepalives_idle = 60
  7. tcp_keepalives_interval = 10
  8. tcp_keepalives_count = 10
  9. shared_buffers = 128GB
  10. huge_pages = on
  11. work_mem = 32MB
  12. maintenance_work_mem = 1GB
  13. dynamic_shared_memory_type = posix
  14. vacuum_cost_delay = 0
  15. bgwriter_delay = 10ms
  16. bgwriter_lru_maxpages = 500
  17. bgwriter_lru_multiplier = 5.0
  18. effective_io_concurrency = 0
  19. max_worker_processes = 128
  20. max_parallel_maintenance_workers = 2
  21. max_parallel_workers_per_gather = 32
  22. parallel_leader_participation = on
  23. max_parallel_workers = 128
  24. wal_level = minimal
  25. synchronous_commit = off
  26. wal_buffers = 128MB
  27. wal_writer_delay = 10ms
  28. wal_writer_flush_after = 8MB
  29. checkpoint_timeout = 35min
  30. max_wal_size = 256GB
  31. min_wal_size = 64GB
  32. max_wal_senders = 0
  33. enable_bitmapscan = on
  34. enable_hashagg = on
  35. enable_hashjoin = on
  36. enable_indexscan = on
  37. enable_indexonlyscan = on
  38. enable_material = on
  39. enable_mergejoin = on
  40. enable_nestloop = on
  41. enable_parallel_append = on
  42. enable_seqscan = on
  43. enable_sort = on
  44. enable_tidscan = on
  45. enable_partitionwise_join = on
  46. enable_partitionwise_aggregate = on
  47. enable_parallel_hash = on
  48. random_page_cost = 1.1
  49. effective_cache_size = 400GB
  50. log_destination = 'csvlog'
  51. logging_collector = on
  52. log_truncate_on_rotation = on
  53. log_checkpoints = on
  54. log_connections = on
  55. log_disconnections = on
  56. log_error_verbosity = verbose
  57. log_timezone = 'PRC'
  58. log_autovacuum_min_duration = 0
  59. autovacuum_vacuum_cost_delay = 0
  60. datestyle = 'iso, mdy'
  61. timezone = 'PRC'
  62. lc_messages = 'en_US.UTF8'
  63. lc_monetary = 'en_US.UTF8'
  64. lc_numeric = 'en_US.UTF8'
  65. lc_time = 'en_US.UTF8'
  66. default_text_search_config = 'pg_catalog.english'
  67. # JIT相关参数
  68. jit_above_cost = 100000
  69. jit_optimize_above_cost = 500000
  70. jit_inline_above_cost = 500000
  71. dynamic_library_path = '$libdir'
  72. jit = on
  73. jit_provider = 'llvmjit'

JIT支持到什么程度了

目前已支持表达式、TUPLE DEFORM的JIT。

Currently PostgreSQL's JIT implementation has support for accelerating expression evaluation and tuple deforming. Several other operations could be accelerated in the future.

Expression evaluation is used to evaluate WHERE clauses, target lists, aggregates and projections. It can be accelerated by generating code specific to each case.

Tuple deforming is the process of transforming an on-disk tuple (see Section 68.6) into its in-memory representation. It can be accelerated by creating a function specific to the table layout and the number of columns to be extracted.

什么样的SQL适合JIT

1、需要处理的数据量庞大

2、每条记录需要大量的操作符参与计算

3、需要查询的字段靠后,需要先deform tuple中前面的字段

  1. 249 Currently expression evaluation and tuple deforming are JITed. Those
  2. 250 were chosen because they commonly are major CPU bottlenecks in
  3. 251 analytics queries, but are by no means the only potentially beneficial cases.
  4. 252
  5. 253 For JITing to be beneficial a piece of code first and foremost has to
  6. 254 be a CPU bottleneck. But also importantly, JITing can only be
  7. 255 beneficial if overhead can be removed by doing so. E.g. in the tuple
  8. 256 deforming case the knowledge about the number of columns and their
  9. 257 types can remove a significant number of branches, and in the
  10. 258 expression evaluation case a lot of indirect jumps/calls can be
  11. 259 removed. If neither of these is the case, JITing is a waste of
  12. 260 resources.
  13. 261
  14. 262 Future avenues for JITing are tuple sorting, COPY parsing/output
  15. 263 generation, and later compiling larger parts of queries.

PG优化器如何选择是否使用JIT

控制参数

1、jit (boolean)

Determines whether JIT may be used by PostgreSQL, if available (see Chapter 32). The default is on.

默认ON,表示开启JIT。

2、jit_above_cost (floating point)

Sets the planner's cutoff above which JIT compilation is used as part of query execution (see Chapter 32). Performing JIT costs time but can accelerate query execution. The default is 100000.

默认100000,当planner发现COST大于这个值时,优化器会启用JIT动态编译。

3、jit_optimize_above_cost (floating point)

Sets the planner's cutoff above which JIT compiled programs (see jit_above_cost) are optimized. Optimization initially takes time, but can improve execution speed. It is not meaningful to set this to a lower value than jit_above_cost. The default is 500000.

默认500000,当planner发现COST大于这个值时,优化器会启用JIT动态编译优化。

4、jit_inline_above_cost (floating point)

Sets the planner's cutoff above which JIT compiled programs (see jit_above_cost) attempt to inline functions and operators. Inlining initially takes time, but can improve execution speed. It is unlikely to be beneficial to set jit_inline_above_cost below jit_optimize_above_cost. The default is 500000.

默认500000,当planner发现COST大于这个值时,优化器会对用户自定义函数、操作符(目前仅支持C, internal类型的函数)启用JIT优化。

https://www.postgresql.org/docs/devel/static/jit-extensibility.html#JIT-PLUGGABLE

5、jit_provider (string)

Determines which JIT provider (see JIT Extensibility) is used. The built-in default is llvmjit.

If set to a non-existent library JIT will not be available, but no error will be raised. This allows JIT support to be installed separately from the main PostgreSQL package. This parameter can only be set at server start.

为了让JIT支持更多的编译器,PG设计时对编译器的支持也是模块化的,通过jit_provider可以指定使用哪个编译器,当然这个需要实现对应的provider接口才行。

https://www.postgresql.org/docs/devel/static/jit-extensibility.html#JIT-PLUGGABLE

目前PG默认选择的是LLVM编译器,原因是LLVM友好的许可协议与PG的开源许可协议无冲突。第二方面是LLVM后面有很强大的公司在支撑,比如苹果。

  1. 52 PostgreSQL, by default, uses LLVM to perform JIT. LLVM was chosen
  2. 53 because it is developed by several large corporations and therefore
  3. 54 unlikely to be discontinued, because it has a license compatible with
  4. 55 PostgreSQL, and because its IR can be generated from C using the Clang
  5. 56 compiler.

控制代码

src/backend/jit/jit.c

  1. /* GUCs */
  2. bool jit_enabled = true;
  3. char *jit_provider = "llvmjit";
  4. bool jit_debugging_support = false;
  5. bool jit_dump_bitcode = false;
  6. bool jit_expressions = true;
  7. bool jit_profiling_support = false;
  8. bool jit_tuple_deforming = true;
  9. double jit_above_cost = 100000;
  10. double jit_inline_above_cost = 500000;
  11. double jit_optimize_above_cost = 500000;

src/include/jit/jit.h

  1. /* Flags determining what kind of JIT operations to perform */
  2. #define PGJIT_NONE 0
  3. #define PGJIT_PERFORM 1 << 0
  4. #define PGJIT_OPT3 1 << 1
  5. #define PGJIT_INLINE 1 << 2
  6. #define PGJIT_EXPR 1 << 3
  7. #define PGJIT_DEFORM 1 << 4

src/backend/optimizer/plan/planner.c

  1. result->jitFlags = PGJIT_NONE;
  2. if (jit_enabled && jit_above_cost >= 0 &&
  3. top_plan->total_cost > jit_above_cost)
  4. {
  5. result->jitFlags |= PGJIT_PERFORM;
  6. /*
  7. * Decide how much effort should be put into generating better code.
  8. */
  9. if (jit_optimize_above_cost >= 0 &&
  10. top_plan->total_cost > jit_optimize_above_cost)
  11. result->jitFlags |= PGJIT_OPT3;
  12. if (jit_inline_above_cost >= 0 &&
  13. top_plan->total_cost > jit_inline_above_cost)
  14. result->jitFlags |= PGJIT_INLINE;
  15. /*
  16. * Decide which operations should be JITed.
  17. */
  18. if (jit_expressions)
  19. result->jitFlags |= PGJIT_EXPR;
  20. if (jit_tuple_deforming)
  21. result->jitFlags |= PGJIT_DEFORM;
  22. }

开发参数

1、jit_debugging_support (boolean)

If LLVM has the required functionality, register generated functions with GDB. This makes debugging easier. The default setting is off, and can only be set at server start.

如果要使用GDB调试JIT动态产生的code,那么需要打开jit_debugging_support

2、jit_dump_bitcode (boolean)

Writes the generated LLVM IR out to the filesystem, inside data_directory. This is only useful for working on the internals of the JIT implementation. The default setting is off, and it can only be changed by a superuser.

是否导出JIT产生的中间代码。会写入$PGDATA目录中。

3、jit_expressions (boolean)

Determines whether expressions are JIT compiled, subject to costing decisions (see Section 32.2). The default is on.

是否对表达式实施jit。默认打开,关闭的话不会对表达式实施JIT.

4、jit_profiling_support (boolean)

If LLVM has the required functionality, emit required data to allow perf to profile functions generated by JIT. This writes out files to $HOME/.debug/jit/; the user is responsible for performing cleanup when desired. The default setting is off, and can only be set at server start.

如果要使用perf来分析JIT代码的profiling,那么需要打开jit_profiling_support这个参数。结果保留在$HOME/.debug/jit/,注意自己清理这个目录。

5、jit_tuple_deforming (boolean)

Determines whether tuple deforming is JIT compiled, subject to costing decisions (see Section 32.2). The default is on.

是否对tuple deforming式实施jit。默认打开,关闭的话不会对tuple deforming(即将磁盘中的TUPLE格式转换为内存中的TUPLE格式)实施JIT.

JIT code gen有CACHE吗

暂时没有

  1. 225 Currently it is not yet possible to cache generated functions, even
  2. 226 though that'd be desirable from a performance point of view. The
  3. 227 problem is that the generated functions commonly contain pointers into
  4. 228 per-execution memory. The expression evaluation machinery needs to
  5. 229 be redesigned a bit to avoid that. Basically all per-execution memory
  6. 230 needs to be referenced as an offset to one block of memory stored in
  7. 231 an ExprState, rather than absolute pointers into memory.
  8. 232
  9. 233 Once that is addressed, adding an LRU cache that's keyed by the
  10. 234 generated LLVM IR will allow to use optimized functions even for
  11. 235 faster queries.
  12. 236
  13. 237 A longer term project is to move expression compilation to the planner
  14. 238 stage, allowing e.g. to tie compiled expressions to prepared
  15. 239 statements.
  16. 240
  17. 241 An even more advanced approach would be to use JIT with few
  18. 242 optimizations initially, and build an optimized version in the
  19. 243 background. But that's even further off.
  20. 244

目前支持的编译器有哪些

目前支持LLVM

  1. 52 PostgreSQL, by default, uses LLVM to perform JIT. LLVM was chosen
  2. 53 because it is developed by several large corporations and therefore
  3. 54 unlikely to be discontinued, because it has a license compatible with
  4. 55 PostgreSQL, and because its IR can be generated from C using the Clang
  5. 56 compiler.
  6. 199 It obviously is undesirable to maintain a second implementation of
  7. 200 commonly used functions, just for inlining purposes. Instead we take
  8. 201 advantage of the fact that the Clang compiler can emit LLVM IR.

通过实现jit provider可以扩展更多的编译器支持。

  1. 59 Shared Library Separation
  2. 60 -------------------------
  3. 61
  4. 62 To avoid the main PostgreSQL binary directly depending on LLVM, which
  5. 63 would prevent LLVM support being independently installed by OS package
  6. 64 managers, the LLVM dependent code is located in a shared library that
  7. 65 is loaded on-demand.
  8. 66
  9. 67 An additional benefit of doing so is that it is relatively easy to
  10. 68 evaluate JIT compilation that does not use LLVM, by changing out the
  11. 69 shared library used to provide JIT compilation.
  12. 70
  13. 71 To achieve this, code intending to perform JIT (e.g. expression evaluation)
  14. 72 calls an LLVM independent wrapper located in jit.c to do so. If the
  15. 73 shared library providing JIT support can be loaded (i.e. PostgreSQL was
  16. 74 compiled with LLVM support and the shared library is installed), the task
  17. 75 of JIT compiling an expression gets handed off to the shared library. This
  18. 76 obviously requires that the function in jit.c is allowed to fail in case
  19. 77 no JIT provider can be loaded.
  20. 78
  21. 79 Which shared library is loaded is determined by the jit_provider GUC,
  22. 80 defaulting to "llvmjit".
  23. 81
  24. 82 Cloistering code performing JIT into a shared library unfortunately
  25. 83 also means that code doing JIT compilation for various parts of code
  26. 84 has to be located separately from the code doing so without
  27. 85 JIT. E.g. the JIT version of execExprInterp.c is located in jit/llvm/
  28. 86 rather than executor/.

内置的类型转换在哪里

  1. 177 Instead there is one small file (llvmjit_types.c) which references each of
  2. 178 the types required for JITing. That file is translated to bitcode at
  3. 179 compile time, and loaded when LLVM is initialized in a backend.

已有的JIT bitcode代码在哪里

  1. $pkglibdir/bitcode/postgres/
  2. 211 Similarly extensions can install code into
  3. 212 $pkglibdir/bitcode/[extension]/
  4. 213 accompanied by
  5. 214 $pkglibdir/bitcode/[extension].index.bc

性能对比

1、建表(65个字段,33个INT8,32个TEXT)

  1. do language plpgsql $$
  2. declare
  3. sql text;
  4. begin
  5. sql := 'create table test(id int8,';
  6. -- 32 个text字段
  7. for i in 1..32 loop
  8. sql := sql||'c'||i||' text default md5(random()::text),';
  9. end loop;
  10. -- 32 个int8字段
  11. for i in 33..64 loop
  12. sql := sql||'c'||i||' int8 default random()*10,';
  13. end loop;
  14. sql := rtrim(sql, ',');
  15. sql := sql||')';
  16. execute sql;
  17. end;
  18. $$;

2、写入1亿测试数据

for((i=1;i<=100;i++)) do nohup psql -c "insert into test select generate_series(1,1000000);" >/dev/null 2>&1 & done  

空间占用127 GB

  1. postgres=# \dt+
  2. List of relations
  3. Schema | Name | Type | Owner | Size | Description
  4. --------+------+-------+----------+--------+-------------
  5. public | test | table | postgres | 127 GB |
  6. (1 row)

并行度24强制开关

  1. set max_parallel_workers_per_gather =24;
  2. set parallel_setup_cost =0;
  3. set parallel_tuple_cost =0;
  4. set min_parallel_table_scan_size =0;
  5. set min_parallel_index_scan_size =0;
  6. alter table test set (parallel_workers =24);

3、多表达式计算测试

  1. explain select c33,c64,count(*),avg(c33),min(c34),max(c35),stddev(c36),sum(c37),stddev(greatest(c37,c38)),
  2. sum(
  3. case c38
  4. when 0 then c39
  5. when 1 then c40
  6. when 2 then c41
  7. when 3 then c42
  8. when 4 then c43
  9. when 5 then c44
  10. when 6 then c45
  11. when 7 then c46
  12. when 8 then c47
  13. when 9 then c48
  14. when 10 then c49
  15. when 11 then c50
  16. when 12 then c51
  17. when 13 then c52
  18. when 14 then c53
  19. when 15 then c54
  20. when 16 then c55
  21. else c56
  22. end
  23. ) from test
  24. where
  25. c64^2>0
  26. and sqrt(c64)>0
  27. and hashtext(c1)>hashtext(c2)
  28. and
  29. pow(sqrt(abs(hashint8(case c38
  30. when 0 then c39
  31. when 1 then c40
  32. when 2 then c41
  33. when 3 then c42
  34. when 4 then c43
  35. when 5 then c44
  36. when 6 then c45
  37. when 7 then c46
  38. when 8 then c47
  39. when 9 then c48
  40. when 10 then c49
  41. when 11 then c50
  42. when 12 then c51
  43. when 13 then c52
  44. when 14 then c53
  45. when 15 then c54
  46. when 16 then c55
  47. else c56
  48. end))),2) >0
  49. group by c33,c64;
  50. QUERY PLAN
  51. -------------------------------------------------------------------------------------------
  52. Finalize GroupAggregate (cost=4545694.04..4549612.47 rows=121 width=200)
  53. Group Key: c33, c64
  54. -> Gather Merge (cost=4545694.04..4549500.85 rows=2904 width=200)
  55. Workers Planned: 24
  56. -> Partial GroupAggregate (cost=4545693.46..4549425.58 rows=121 width=200)
  57. Group Key: c33, c64
  58. -> Sort (cost=4545693.46..4545822.06 rows=51440 width=200)
  59. Sort Key: c33, c64
  60. -> Parallel Seq Scan on test (cost=0.00..4541668.12 rows=51440 width=200)
  61. Filter: ((((c64)::double precision ^ '2'::double precision) > '0'::double precision) AND (sqrt((c64)::double precision) > '0'::double precision) AND (hashtext(c1) > hashtext(c2)) AND (pow(sqrt((abs(hashint8(CASE c38 WHEN 0 THEN c39 WHEN 1 THEN c40 WHEN 2 THEN c41 WHEN 3 THEN c42 WHEN 4 THEN c43 WHEN 5 THEN c44 WHEN 6 THEN c45 WHEN 7 THEN c46 WHEN 8 THEN c47 WHEN 9 THEN c48 WHEN 10 THEN c49 WHEN 11 THEN c50 WHEN 12 THEN c51 WHEN 13 THEN c52 WHEN 14 THEN c53 WHEN 15 THEN c54 WHEN 16 THEN c55 ELSE c56 END)))::double precision), '2'::double precision) > '0'::double precision))
  62. JIT:
  63. Functions: 18
  64. Inlining: true
  65. Optimization: true
  66. (14 rows)

3.1、开启JIT

set jit=on;  

并行

  1. set max_parallel_workers_per_gather =24;
  2. Finalize GroupAggregate (cost=4545694.04..4549612.47 rows=121 width=200) (actual time=16228.136..18083.962 rows=110 loops=1)
  3. Output: c33, c64, count(*), avg(c33), min(c34), max(c35), stddev(c36), sum(c37), stddev(GREATEST(c37, c38)), sum(CASE c38 WHEN 0 THEN c39 WHEN 1 THEN c40 WHEN 2 THEN c41 WHEN 3 THEN c42 WHEN 4 THEN c43 WHEN 5 THEN c44 WHEN 6 THEN c45 WHEN 7 THEN c46 WHEN 8 THEN c47 WHEN 9 THEN c48 WHEN 10 THEN c49 WHEN 11 THEN c50 WHEN 12 THEN c51 WHEN 13 THEN c52 WHEN 14 THEN c53 WHEN 15 THEN c54 WHEN 16 THEN c55 ELSE c56 END)
  4. Group Key: test.c33, test.c64
  5. Buffers: shared hit=204165, temp read=14929 written=14950
  6. -> Gather Merge (cost=4545694.04..4549500.85 rows=2904 width=200) (actual time=16219.163..18075.339 rows=2750 loops=1)
  7. Output: c33, c64, (PARTIAL count(*)), (PARTIAL avg(c33)), (PARTIAL min(c34)), (PARTIAL max(c35)), (PARTIAL stddev(c36)), (PARTIAL sum(c37)), (PARTIAL stddev(GREATEST(c37, c38))), (PARTIAL sum(CASE c38 WHEN 0 THEN c39 WHEN 1 THEN c40 WHEN 2 THEN c41 WHEN 3 THEN c42 WHEN 4 THEN c43 WHEN 5 THEN c44 WHEN 6 THEN c45 WHEN 7 THEN c46 WHEN 8 THEN c47 WHEN 9 THEN c48 WHEN 10 THEN c49 WHEN 11 THEN c50 WHEN 12 THEN c51 WHEN 13 THEN c52 WHEN 14 THEN c53 WHEN 15 THEN c54 WHEN 16 THEN c55 ELSE c56 END))
  8. Workers Planned: 24
  9. Workers Launched: 24
  10. Buffers: shared hit=204165, temp read=14929 written=14950
  11. -> Partial GroupAggregate (cost=4545693.46..4549425.58 rows=121 width=200) (actual time=16055.979..17551.930 rows=110 loops=25)
  12. Output: c33, c64, PARTIAL count(*), PARTIAL avg(c33), PARTIAL min(c34), PARTIAL max(c35), PARTIAL stddev(c36), PARTIAL sum(c37), PARTIAL stddev(GREATEST(c37, c38)), PARTIAL sum(CASE c38 WHEN 0 THEN c39 WHEN 1 THEN c40 WHEN 2 THEN c41 WHEN 3 THEN c42 WHEN 4 THEN c43 WHEN 5 THEN c44 WHEN 6 THEN c45 WHEN 7 THEN c46 WHEN 8 THEN c47 WHEN 9 THEN c48 WHEN 10 THEN c49 WHEN 11 THEN c50 WHEN 12 THEN c51 WHEN 13 THEN c52 WHEN 14 THEN c53 WHEN 15 THEN c54 WHEN 16 THEN c55 ELSE c56 END)
  13. Group Key: test.c33, test.c64
  14. Buffers: shared hit=4167196, temp read=304952 written=305369
  15. Worker 0: actual time=16063.931..17380.570 rows=110 loops=1
  16. Buffers: shared hit=146725, temp read=10743 written=10758
  17. Worker 1: actual time=16157.211..17453.030 rows=110 loops=1
  18. Buffers: shared hit=144252, temp read=10570 written=10584
  19. Worker 2: actual time=15937.390..17766.085 rows=110 loops=1
  20. Buffers: shared hit=203275, temp read=14873 written=14894
  21. Worker 3: actual time=16087.670..17463.100 rows=110 loops=1
  22. Buffers: shared hit=153917, temp read=11273 written=11288
  23. Worker 4: actual time=16055.948..17369.731 rows=110 loops=1
  24. Buffers: shared hit=146569, temp read=10723 written=10738
  25. Worker 5: actual time=16142.715..17518.040 rows=110 loops=1
  26. Buffers: shared hit=148380, temp read=10854 written=10869
  27. Worker 6: actual time=16191.173..17582.148 rows=110 loops=1
  28. Buffers: shared hit=154115, temp read=11273 written=11288
  29. Worker 7: actual time=16125.974..17411.918 rows=110 loops=1
  30. Buffers: shared hit=144344, temp read=10570 written=10584
  31. Worker 8: actual time=16162.576..17452.647 rows=110 loops=1
  32. Buffers: shared hit=144356, temp read=10570 written=10584
  33. Worker 9: actual time=16028.487..17333.188 rows=110 loops=1
  34. Buffers: shared hit=146326, temp read=10703 written=10718
  35. Worker 10: actual time=15909.637..17710.282 rows=110 loops=1
  36. Buffers: shared hit=202502, temp read=14819 written=14840
  37. Worker 11: actual time=15991.109..17201.509 rows=110 loops=1
  38. Buffers: shared hit=136312, temp read=9973 written=9987
  39. Worker 12: actual time=15948.994..17758.986 rows=110 loops=1
  40. Buffers: shared hit=202082, temp read=14788 written=14808
  41. Worker 13: actual time=15923.985..17734.805 rows=110 loops=1
  42. Buffers: shared hit=202136, temp read=14788 written=14808
  43. Worker 14: actual time=16185.559..17477.490 rows=110 loops=1
  44. Buffers: shared hit=144407, temp read=10570 written=10584
  45. Worker 15: actual time=15957.127..17879.999 rows=110 loops=1
  46. Buffers: shared hit=214616, temp read=15702 written=15724
  47. Worker 16: actual time=16150.016..17429.737 rows=110 loops=1
  48. Buffers: shared hit=138941, temp read=10168 written=10182
  49. Worker 17: actual time=16036.894..17849.093 rows=110 loops=1
  50. Buffers: shared hit=202016, temp read=14788 written=14808
  51. Worker 18: actual time=16129.668..17369.734 rows=110 loops=1
  52. Buffers: shared hit=138300, temp read=10126 written=10140
  53. Worker 19: actual time=15948.382..17832.470 rows=110 loops=1
  54. Buffers: shared hit=211842, temp read=15491 written=15512
  55. Worker 20: actual time=16153.969..17352.508 rows=110 loops=1
  56. Buffers: shared hit=134834, temp read=9867 written=9880
  57. Worker 21: actual time=15916.013..17716.694 rows=110 loops=1
  58. Buffers: shared hit=202104, temp read=14788 written=14808
  59. Worker 22: actual time=16164.524..17462.568 rows=110 loops=1
  60. Buffers: shared hit=144423, temp read=10570 written=10584
  61. Worker 23: actual time=16046.586..17455.273 rows=110 loops=1
  62. Buffers: shared hit=156257, temp read=11433 written=11449
  63. -> Sort (cost=4545693.46..4545822.06 rows=51440 width=200) (actual time=16048.787..16592.827 rows=1900091 loops=25)
  64. Output: c33, c64, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56
  65. Sort Key: test.c33, test.c64
  66. Sort Method: external merge Disk: 477728kB
  67. Worker 0: Sort Method: external merge Disk: 343776kB
  68. Worker 1: Sort Method: external merge Disk: 338240kB
  69. Worker 2: Sort Method: external merge Disk: 475936kB
  70. Worker 3: Sort Method: external merge Disk: 360736kB
  71. Worker 4: Sort Method: external merge Disk: 343136kB
  72. Worker 5: Sort Method: external merge Disk: 347328kB
  73. Worker 6: Sort Method: external merge Disk: 360736kB
  74. Worker 7: Sort Method: external merge Disk: 338240kB
  75. Worker 8: Sort Method: external merge Disk: 338240kB
  76. Worker 9: Sort Method: external merge Disk: 342496kB
  77. Worker 10: Sort Method: external merge Disk: 474208kB
  78. Worker 11: Sort Method: external merge Disk: 319136kB
  79. Worker 12: Sort Method: external merge Disk: 473216kB
  80. Worker 13: Sort Method: external merge Disk: 473216kB
  81. Worker 14: Sort Method: external merge Disk: 338240kB
  82. Worker 15: Sort Method: external merge Disk: 502464kB
  83. Worker 16: Sort Method: external merge Disk: 325376kB
  84. Worker 17: Sort Method: external merge Disk: 473216kB
  85. Worker 18: Sort Method: external merge Disk: 324032kB
  86. Worker 19: Sort Method: external merge Disk: 495712kB
  87. Worker 20: Sort Method: external merge Disk: 315744kB
  88. Worker 21: Sort Method: external merge Disk: 473216kB
  89. Worker 22: Sort Method: external merge Disk: 338240kB
  90. Worker 23: Sort Method: external merge Disk: 365856kB
  91. Buffers: shared hit=4167196, temp read=304952 written=305369
  92. Worker 0: actual time=16057.644..16539.626 rows=1673456 loops=1
  93. Buffers: shared hit=146725, temp read=10743 written=10758
  94. Worker 1: actual time=16150.861..16625.428 rows=1646431 loops=1
  95. Buffers: shared hit=144252, temp read=10570 written=10584
  96. Worker 2: actual time=15928.651..16592.993 rows=2316826 loops=1
  97. Buffers: shared hit=203275, temp read=14873 written=14894
  98. Worker 3: actual time=16081.087..16581.918 rows=1755939 loops=1
  99. Buffers: shared hit=153917, temp read=11273 written=11288
  100. Worker 4: actual time=16049.644..16528.522 rows=1670397 loops=1
  101. Buffers: shared hit=146569, temp read=10723 written=10738
  102. Worker 5: actual time=16135.668..16623.247 rows=1690829 loops=1
  103. Buffers: shared hit=148380, temp read=10854 written=10869
  104. Worker 6: actual time=16184.539..16693.836 rows=1755942 loops=1
  105. Buffers: shared hit=154115, temp read=11273 written=11288
  106. Worker 7: actual time=16119.729..16583.823 rows=1646433 loops=1
  107. Buffers: shared hit=144344, temp read=10570 written=10584
  108. Worker 8: actual time=16156.343..16626.634 rows=1646432 loops=1
  109. Buffers: shared hit=144356, temp read=10570 written=10584
  110. Worker 9: actual time=16022.254..16496.534 rows=1667285 loops=1
  111. Buffers: shared hit=146326, temp read=10703 written=10718
  112. Worker 10: actual time=15901.059..16553.594 rows=2308407 loops=1
  113. Buffers: shared hit=202502, temp read=14819 written=14840
  114. Worker 11: actual time=15985.290..16423.172 rows=1553527 loops=1
  115. Buffers: shared hit=136312, temp read=9973 written=9987
  116. Worker 12: actual time=15940.471..16603.025 rows=2303469 loops=1
  117. Buffers: shared hit=202082, temp read=14788 written=14808
  118. Worker 13: actual time=15915.230..16578.124 rows=2303471 loops=1
  119. Buffers: shared hit=202136, temp read=14788 written=14808
  120. Worker 14: actual time=16179.341..16650.442 rows=1646430 loops=1
  121. Buffers: shared hit=144407, temp read=10570 written=10584
  122. Worker 15: actual time=15948.014..16649.356 rows=2445950 loops=1
  123. Buffers: shared hit=214616, temp read=15702 written=15724
  124. Worker 16: actual time=16144.032..16604.106 rows=1583954 loops=1
  125. Buffers: shared hit=138941, temp read=10168 written=10182
  126. Worker 17: actual time=16028.217..16692.633 rows=2303463 loops=1
  127. Buffers: shared hit=202016, temp read=14788 written=14808
  128. Worker 18: actual time=16123.611..16575.587 rows=1577397 loops=1
  129. Buffers: shared hit=138300, temp read=10126 written=10140
  130. Worker 19: actual time=15939.379..16625.915 rows=2412964 loops=1
  131. Buffers: shared hit=211842, temp read=15491 written=15512
  132. Worker 20: actual time=16148.053..16579.552 rows=1536937 loops=1
  133. Buffers: shared hit=134834, temp read=9867 written=9880
  134. Worker 21: actual time=15907.527..16565.235 rows=2303464 loops=1
  135. Buffers: shared hit=202104, temp read=14788 written=14808
  136. Worker 22: actual time=16158.176..16633.146 rows=1646438 loops=1
  137. Buffers: shared hit=144423, temp read=10570 written=10584
  138. Worker 23: actual time=16039.773..16559.599 rows=1780966 loops=1
  139. Buffers: shared hit=156257, temp read=11433 written=11449
  140. -> Parallel Seq Scan on public.test (cost=0.00..4541668.12 rows=51440 width=200) (actual time=1032.131..5845.529 rows=1900091 loops=25)
  141. Output: c33, c64, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56
  142. Filter: ((((test.c64)::double precision ^ '2'::double precision) > '0'::double precision) AND (sqrt((test.c64)::double precision) > '0'::double precision) AND (hashtext(test.c1) > hashtext(test.c2)) AND (pow(sqrt((abs(hashint8(CASE test.c38 WHEN 0 THEN test.c39 WHEN 1 THEN test.c40 WHEN 2 THEN test.c41 WHEN 3 THEN test.c42 WHEN 4 THEN test.c43 WHEN 5 THEN test.c44 WHEN 6 THEN test.c45 WHEN 7 THEN test.c46 WHEN 8 THEN test.c47 WHEN 9 THEN test.c48 WHEN 10 THEN test.c49 WHEN 11 THEN test.c50 WHEN 12 THEN test.c51 WHEN 13 THEN test.c52 WHEN 14 THEN test.c53 WHEN 15 THEN test.c54 WHEN 16 THEN test.c55 ELSE test.c56 END)))::double precision), '2'::double precision) > '0'::double precision))
  143. Rows Removed by Filter: 2099909
  144. Buffers: shared hit=4166668
  145. Worker 0: actual time=1020.341..5255.168 rows=1673456 loops=1
  146. Buffers: shared hit=146703
  147. Worker 1: actual time=1027.407..5221.878 rows=1646431 loops=1
  148. Buffers: shared hit=144230
  149. Worker 2: actual time=1153.318..6915.975 rows=2316826 loops=1
  150. Buffers: shared hit=203253
  151. Worker 3: actual time=1020.220..5472.580 rows=1755939 loops=1
  152. Buffers: shared hit=153895
  153. Worker 4: actual time=1032.901..5292.200 rows=1670397 loops=1
  154. Buffers: shared hit=146547
  155. Worker 5: actual time=1017.394..5301.776 rows=1690829 loops=1
  156. Buffers: shared hit=148358
  157. Worker 6: actual time=1030.128..5511.666 rows=1755942 loops=1
  158. Buffers: shared hit=154093
  159. Worker 7: actual time=1030.999..5254.166 rows=1646433 loops=1
  160. Buffers: shared hit=144322
  161. Worker 8: actual time=1029.738..5242.343 rows=1646432 loops=1
  162. Buffers: shared hit=144334
  163. Worker 9: actual time=1043.564..5337.423 rows=1667285 loops=1
  164. Buffers: shared hit=146304
  165. Worker 10: actual time=1003.084..6826.056 rows=2308407 loops=1
  166. Buffers: shared hit=202480
  167. Worker 11: actual time=1041.330..5138.905 rows=1553527 loops=1
  168. Buffers: shared hit=136290
  169. Worker 12: actual time=1006.781..6753.862 rows=2303469 loops=1
  170. Buffers: shared hit=202060
  171. Worker 13: actual time=1004.776..6749.535 rows=2303471 loops=1
  172. Buffers: shared hit=202114
  173. Worker 14: actual time=1030.832..5281.034 rows=1646430 loops=1
  174. Buffers: shared hit=144385
  175. Worker 15: actual time=1003.335..7089.916 rows=2445950 loops=1
  176. Buffers: shared hit=214594
  177. Worker 16: actual time=1015.769..5129.373 rows=1583954 loops=1
  178. Buffers: shared hit=138919
  179. Worker 17: actual time=1003.077..6892.321 rows=2303463 loops=1
  180. Buffers: shared hit=201994
  181. Worker 18: actual time=1016.105..5052.138 rows=1577397 loops=1
  182. Buffers: shared hit=138278
  183. Worker 19: actual time=1006.413..6976.141 rows=2412964 loops=1
  184. Buffers: shared hit=211820
  185. Worker 20: actual time=1013.958..5004.057 rows=1536937 loops=1
  186. Buffers: shared hit=134812
  187. Worker 21: actual time=1009.717..6748.666 rows=2303464 loops=1
  188. Buffers: shared hit=202082
  189. Worker 22: actual time=1013.944..5248.687 rows=1646438 loops=1
  190. Buffers: shared hit=144401
  191. Worker 23: actual time=1009.779..5471.091 rows=1780966 loops=1
  192. Buffers: shared hit=156235
  193. Planning Time: 0.233 ms
  194. JIT:
  195. Functions: 18
  196. Generation Time: 7.829 ms
  197. Inlining: true
  198. Inlining Time: 19.924 ms
  199. Optimization: true
  200. Optimization Time: 778.549 ms
  201. Emission Time: 404.543 ms
  202. Execution Time: 18161.029 ms
  203. (201 rows)

非并行

  1. set max_parallel_workers_per_gather =0;
  2. HashAggregate (cost=13253090.64..13253093.36 rows=121 width=200) (actual time=142124.876..142125.769 rows=110 loops=1)
  3. Output: c33, c64, count(*), avg(c33), min(c34), max(c35), stddev(c36), sum(c37), stddev(GREATEST(c37, c38)), sum(CASE c38 WHEN 0 THEN c39 WHEN 1 THEN c40 WHEN 2 THEN c41 WHEN 3 THEN c42 WHEN 4 THEN c43 WHEN 5 THEN c44 WHEN 6 THEN c45 WHEN 7 THEN c46 WHEN 8 THEN c47 WHEN 9 THEN c48 WHEN 10 THEN c49 WHEN 11 THEN c50 WHEN 12 THEN c51 WHEN 13 THEN c52 WHEN 14 THEN c53 WHEN 15 THEN c54 WHEN 16 THEN c55 ELSE c56 END)
  4. Group Key: test.c33, test.c64
  5. Buffers: shared hit=4166668
  6. -> Seq Scan on public.test (cost=0.00..13166670.88 rows=1234568 width=200) (actual time=1127.028..114279.915 rows=47502283 loops=1)
  7. Output: id, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c21, c22, c23, c24, c25, c26, c27, c28, c29, c30, c31, c32, c33, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56, c57, c58, c59, c60, c61, c62, c63, c64
  8. Filter: ((((test.c64)::double precision ^ '2'::double precision) > '0'::double precision) AND (sqrt((test.c64)::double precision) > '0'::double precision) AND (hashtext(test.c1) > hashtext(test.c2)) AND (pow(sqrt((abs(hashint8(CASE test.c38 WHEN 0 THEN test.c39 WHEN 1 THEN test.c40 WHEN 2 THEN test.c41 WHEN 3 THEN test.c42 WHEN 4 THEN test.c43 WHEN 5 THEN test.c44 WHEN 6 THEN test.c45 WHEN 7 THEN test.c46 WHEN 8 THEN test.c47 WHEN 9 THEN test.c48 WHEN 10 THEN test.c49 WHEN 11 THEN test.c50 WHEN 12 THEN test.c51 WHEN 13 THEN test.c52 WHEN 14 THEN test.c53 WHEN 15 THEN test.c54 WHEN 16 THEN test.c55 ELSE test.c56 END)))::double precision), '2'::double precision) > '0'::double precision))
  9. Rows Removed by Filter: 52497717
  10. Buffers: shared hit=4166668
  11. Planning Time: 0.668 ms
  12. JIT:
  13. Functions: 9
  14. Generation Time: 6.997 ms
  15. Inlining: true
  16. Inlining Time: 16.558 ms
  17. Optimization: true
  18. Optimization Time: 739.139 ms
  19. Emission Time: 358.289 ms
  20. Execution Time: 142132.967 ms
  21. (19 rows)

3.2、关闭JIT

set jit=off;  

并行

  1. set max_parallel_workers_per_gather =24;
  2. Finalize GroupAggregate (cost=4545694.04..4549612.47 rows=121 width=200) (actual time=12229.706..14319.958 rows=110 loops=1)
  3. Output: c33, c64, count(*), avg(c33), min(c34), max(c35), stddev(c36), sum(c37), stddev(GREATEST(c37, c38)), sum(CASE c38 WHEN 0 THEN c39 WHEN 1 THEN c40 WHEN 2 THEN c41 WHEN 3 THEN c42 WHEN 4 THEN c43 WHEN 5 THEN c44 WHEN 6 THEN c45 WHEN 7 THEN c46 WHEN 8 THEN c47 WHEN 9 THEN c48 WHEN 10 THEN c49 WHEN 11 THEN c50 WHEN 12 THEN c51 WHEN 13 THEN c52 WHEN 14 THEN c53 WHEN 15 THEN c54 WHEN 16 THEN c55 ELSE c56 END)
  4. Group Key: test.c33, test.c64
  5. Buffers: shared hit=193072, temp read=14121 written=14141
  6. -> Gather Merge (cost=4545694.04..4549500.85 rows=2904 width=200) (actual time=12218.740..14311.155 rows=2750 loops=1)
  7. Output: c33, c64, (PARTIAL count(*)), (PARTIAL avg(c33)), (PARTIAL min(c34)), (PARTIAL max(c35)), (PARTIAL stddev(c36)), (PARTIAL sum(c37)), (PARTIAL stddev(GREATEST(c37, c38))), (PARTIAL sum(CASE c38 WHEN 0 THEN c39 WHEN 1 THEN c40 WHEN 2 THEN c41 WHEN 3 THEN c42 WHEN 4 THEN c43 WHEN 5 THEN c44 WHEN 6 THEN c45 WHEN 7 THEN c46 WHEN 8 THEN c47 WHEN 9 THEN c48 WHEN 10 THEN c49 WHEN 11 THEN c50 WHEN 12 THEN c51 WHEN 13 THEN c52 WHEN 14 THEN c53 WHEN 15 THEN c54 WHEN 16 THEN c55 ELSE c56 END))
  8. Workers Planned: 24
  9. Workers Launched: 24
  10. Buffers: shared hit=193072, temp read=14121 written=14141
  11. -> Partial GroupAggregate (cost=4545693.46..4549425.58 rows=121 width=200) (actual time=12085.108..13888.218 rows=110 loops=25)
  12. Output: c33, c64, PARTIAL count(*), PARTIAL avg(c33), PARTIAL min(c34), PARTIAL max(c35), PARTIAL stddev(c36), PARTIAL sum(c37), PARTIAL stddev(GREATEST(c37, c38)), PARTIAL sum(CASE c38 WHEN 0 THEN c39 WHEN 1 THEN c40 WHEN 2 THEN c41 WHEN 3 THEN c42 WHEN 4 THEN c43 WHEN 5 THEN c44 WHEN 6 THEN c45 WHEN 7 THEN c46 WHEN 8 THEN c47 WHEN 9 THEN c48 WHEN 10 THEN c49 WHEN 11 THEN c50 WHEN 12 THEN c51 WHEN 13 THEN c52 WHEN 14 THEN c53 WHEN 15 THEN c54 WHEN 16 THEN c55 ELSE c56 END)
  13. Group Key: test.c33, test.c64
  14. Buffers: shared hit=4167196, temp read=304952 written=305367
  15. Worker 0: actual time=12000.792..13585.035 rows=110 loops=1
  16. Buffers: shared hit=147042, temp read=10761 written=10776
  17. Worker 1: actual time=12031.831..13687.539 rows=110 loops=1
  18. Buffers: shared hit=153988, temp read=11273 written=11288
  19. Worker 2: actual time=12146.820..13812.121 rows=110 loops=1
  20. Buffers: shared hit=154021, temp read=11273 written=11288
  21. Worker 3: actual time=12145.395..13804.585 rows=110 loops=1
  22. Buffers: shared hit=154017, temp read=11273 written=11288
  23. Worker 4: actual time=12091.839..14217.464 rows=110 loops=1
  24. Buffers: shared hit=197518, temp read=14457 written=14477
  25. Worker 5: actual time=12169.045..13787.539 rows=110 loops=1
  26. Buffers: shared hit=150711, temp read=11032 written=11047
  27. Worker 6: actual time=12141.908..13807.836 rows=110 loops=1
  28. Buffers: shared hit=154079, temp read=11273 written=11288
  29. Worker 7: actual time=12110.195..13777.285 rows=110 loops=1
  30. Buffers: shared hit=153838, temp read=11273 written=11288
  31. Worker 8: actual time=12137.576..13814.893 rows=110 loops=1
  32. Buffers: shared hit=154004, temp read=11273 written=11288
  33. Worker 9: actual time=12196.750..13862.298 rows=110 loops=1
  34. Buffers: shared hit=154129, temp read=11273 written=11288
  35. Worker 10: actual time=12092.448..14117.280 rows=110 loops=1
  36. Buffers: shared hit=188201, temp read=13770 written=13789
  37. Worker 11: actual time=12068.763..14188.286 rows=110 loops=1
  38. Buffers: shared hit=197254, temp read=14421 written=14441
  39. Worker 12: actual time=12190.847..13923.117 rows=110 loops=1
  40. Buffers: shared hit=153959, temp read=11273 written=11288
  41. Worker 13: actual time=12093.689..14117.677 rows=110 loops=1
  42. Buffers: shared hit=188177, temp read=13771 written=13790
  43. Worker 14: actual time=12141.815..13701.354 rows=110 loops=1
  44. Buffers: shared hit=144501, temp read=10570 written=10584
  45. Worker 15: actual time=12012.075..14027.824 rows=110 loops=1
  46. Buffers: shared hit=186377, temp read=13640 written=13659
  47. Worker 16: actual time=12092.582..14131.998 rows=110 loops=1
  48. Buffers: shared hit=188052, temp read=13759 written=13778
  49. Worker 17: actual time=11906.057..13579.607 rows=110 loops=1
  50. Buffers: shared hit=154068, temp read=11273 written=11288
  51. Worker 18: actual time=12179.217..13853.580 rows=110 loops=1
  52. Buffers: shared hit=154191, temp read=11273 written=11288
  53. Worker 19: actual time=11944.620..14048.474 rows=110 loops=1
  54. Buffers: shared hit=194987, temp read=14285 written=14305
  55. Worker 20: actual time=12152.533..13818.214 rows=110 loops=1
  56. Buffers: shared hit=153932, temp read=11273 written=11288
  57. Worker 21: actual time=12196.138..13864.195 rows=110 loops=1
  58. Buffers: shared hit=154158, temp read=11273 written=11288
  59. Worker 22: actual time=12091.831..14229.393 rows=110 loops=1
  60. Buffers: shared hit=197454, temp read=14452 written=14472
  61. Worker 23: actual time=11910.497..13476.474 rows=110 loops=1
  62. Buffers: shared hit=145466, temp read=10637 written=10652
  63. -> Sort (cost=4545693.46..4545822.06 rows=51440 width=200) (actual time=12076.303..12606.046 rows=1900091 loops=25)
  64. Output: c33, c64, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56
  65. Sort Key: test.c33, test.c64
  66. Sort Method: external merge Disk: 451872kB
  67. Worker 0: Sort Method: external merge Disk: 344352kB
  68. Worker 1: Sort Method: external merge Disk: 360736kB
  69. Worker 2: Sort Method: external merge Disk: 360736kB
  70. Worker 3: Sort Method: external merge Disk: 360736kB
  71. Worker 4: Sort Method: external merge Disk: 462624kB
  72. Worker 5: Sort Method: external merge Disk: 353024kB
  73. Worker 6: Sort Method: external merge Disk: 360736kB
  74. Worker 7: Sort Method: external merge Disk: 360736kB
  75. Worker 8: Sort Method: external merge Disk: 360736kB
  76. Worker 9: Sort Method: external merge Disk: 360736kB
  77. Worker 10: Sort Method: external merge Disk: 440640kB
  78. Worker 11: Sort Method: external merge Disk: 461472kB
  79. Worker 12: Sort Method: external merge Disk: 360736kB
  80. Worker 13: Sort Method: external merge Disk: 440672kB
  81. Worker 14: Sort Method: external merge Disk: 338240kB
  82. Worker 15: Sort Method: external merge Disk: 436480kB
  83. Worker 16: Sort Method: external merge Disk: 440288kB
  84. Worker 17: Sort Method: external merge Disk: 360736kB
  85. Worker 18: Sort Method: external merge Disk: 360736kB
  86. Worker 19: Sort Method: external merge Disk: 457120kB
  87. Worker 20: Sort Method: external merge Disk: 360736kB
  88. Worker 21: Sort Method: external merge Disk: 360736kB
  89. Worker 22: Sort Method: external merge Disk: 462464kB
  90. Worker 23: Sort Method: external merge Disk: 340384kB
  91. Buffers: shared hit=4167196, temp read=304952 written=305367
  92. Worker 0: actual time=11993.075..12463.819 rows=1676370 loops=1
  93. Buffers: shared hit=147042, temp read=10761 written=10776
  94. Worker 1: actual time=12023.856..12510.021 rows=1755946 loops=1
  95. Buffers: shared hit=153988, temp read=11273 written=11288
  96. Worker 2: actual time=12138.628..12631.082 rows=1755945 loops=1
  97. Buffers: shared hit=154021, temp read=11273 written=11288
  98. Worker 3: actual time=12137.214..12626.779 rows=1755935 loops=1
  99. Buffers: shared hit=154017, temp read=11273 written=11288
  100. Worker 4: actual time=12081.480..12703.075 rows=2251992 loops=1
  101. Buffers: shared hit=197518, temp read=14457 written=14477
  102. Worker 5: actual time=12161.027..12638.282 rows=1718516 loops=1
  103. Buffers: shared hit=150711, temp read=11032 written=11047
  104. Worker 6: actual time=12133.786..12626.911 rows=1755948 loops=1
  105. Buffers: shared hit=154079, temp read=11273 written=11288
  106. Worker 7: actual time=12102.069..12591.735 rows=1755947 loops=1
  107. Buffers: shared hit=153838, temp read=11273 written=11288
  108. Worker 8: actual time=12129.413..12622.455 rows=1755936 loops=1
  109. Buffers: shared hit=154004, temp read=11273 written=11288
  110. Worker 9: actual time=12188.648..12679.617 rows=1755941 loops=1
  111. Buffers: shared hit=154129, temp read=11273 written=11288
  112. Worker 10: actual time=12082.492..12676.479 rows=2145118 loops=1
  113. Buffers: shared hit=188201, temp read=13770 written=13789
  114. Worker 11: actual time=12058.276..12683.336 rows=2246436 loops=1
  115. Buffers: shared hit=197254, temp read=14421 written=14441
  116. Worker 12: actual time=12182.379..12683.611 rows=1755944 loops=1
  117. Buffers: shared hit=153959, temp read=11273 written=11288
  118. Worker 13: actual time=12083.860..12673.566 rows=2145124 loops=1
  119. Buffers: shared hit=188177, temp read=13771 written=13790
  120. Worker 14: actual time=12134.030..12593.859 rows=1646436 loops=1
  121. Buffers: shared hit=144501, temp read=10570 written=10584
  122. Worker 15: actual time=12002.363..12594.854 rows=2124740 loops=1
  123. Buffers: shared hit=186377, temp read=13640 written=13659
  124. Worker 16: actual time=12082.700..12682.126 rows=2143293 loops=1
  125. Buffers: shared hit=188052, temp read=13759 written=13778
  126. Worker 17: actual time=11897.939..12391.175 rows=1755948 loops=1
  127. Buffers: shared hit=154068, temp read=11273 written=11288
  128. Worker 18: actual time=12171.070..12661.251 rows=1755944 loops=1
  129. Buffers: shared hit=154191, temp read=11273 written=11288
  130. Worker 19: actual time=11934.165..12549.351 rows=2225151 loops=1
  131. Buffers: shared hit=194987, temp read=14285 written=14305
  132. Worker 20: actual time=12144.452..12632.096 rows=1755935 loops=1
  133. Buffers: shared hit=153932, temp read=11273 written=11288
  134. Worker 21: actual time=12188.108..12679.023 rows=1755934 loops=1
  135. Buffers: shared hit=154158, temp read=11273 written=11288
  136. Worker 22: actual time=12081.380..12710.648 rows=2251317 loops=1
  137. Buffers: shared hit=197454, temp read=14452 written=14472
  138. Worker 23: actual time=11902.854..12362.911 rows=1656889 loops=1
  139. Buffers: shared hit=145466, temp read=10637 written=10652
  140. -> Parallel Seq Scan on public.test (cost=0.00..4541668.12 rows=51440 width=200) (actual time=0.026..4627.345 rows=1900091 loops=25)
  141. Output: c33, c64, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56
  142. Filter: ((((test.c64)::double precision ^ '2'::double precision) > '0'::double precision) AND (sqrt((test.c64)::double precision) > '0'::double precision) AND (hashtext(test.c1) > hashtext(test.c2)) AND (pow(sqrt((abs(hashint8(CASE test.c38 WHEN 0 THEN test.c39 WHEN 1 THEN test.c40 WHEN 2 THEN test.c41 WHEN 3 THEN test.c42 WHEN 4 THEN test.c43 WHEN 5 THEN test.c44 WHEN 6 THEN test.c45 WHEN 7 THEN test.c46 WHEN 8 THEN test.c47 WHEN 9 THEN test.c48 WHEN 10 THEN test.c49 WHEN 11 THEN test.c50 WHEN 12 THEN test.c51 WHEN 13 THEN test.c52 WHEN 14 THEN test.c53 WHEN 15 THEN test.c54 WHEN 16 THEN test.c55 ELSE test.c56 END)))::double precision), '2'::double precision) > '0'::double precision))
  143. Rows Removed by Filter: 2099909
  144. Buffers: shared hit=4166668
  145. Worker 0: actual time=0.024..4041.154 rows=1676370 loops=1
  146. Buffers: shared hit=147020
  147. Worker 1: actual time=0.026..4232.052 rows=1755946 loops=1
  148. Buffers: shared hit=153966
  149. Worker 2: actual time=0.025..4275.863 rows=1755945 loops=1
  150. Buffers: shared hit=153999
  151. Worker 3: actual time=0.024..4268.778 rows=1755935 loops=1
  152. Buffers: shared hit=153995
  153. Worker 4: actual time=0.025..5517.396 rows=2251992 loops=1
  154. Buffers: shared hit=197496
  155. Worker 5: actual time=0.024..4173.081 rows=1718516 loops=1
  156. Buffers: shared hit=150689
  157. Worker 6: actual time=0.025..4315.820 rows=1755948 loops=1
  158. Buffers: shared hit=154057
  159. Worker 7: actual time=0.028..4271.339 rows=1755947 loops=1
  160. Buffers: shared hit=153816
  161. Worker 8: actual time=0.025..4270.376 rows=1755936 loops=1
  162. Buffers: shared hit=153982
  163. Worker 9: actual time=0.032..4301.026 rows=1755941 loops=1
  164. Buffers: shared hit=154107
  165. Worker 10: actual time=0.025..5193.043 rows=2145118 loops=1
  166. Buffers: shared hit=188179
  167. Worker 11: actual time=0.024..5393.193 rows=2246436 loops=1
  168. Buffers: shared hit=197232
  169. Worker 12: actual time=0.026..4288.686 rows=1755944 loops=1
  170. Buffers: shared hit=153937
  171. Worker 13: actual time=0.026..5177.502 rows=2145124 loops=1
  172. Buffers: shared hit=188155
  173. Worker 14: actual time=0.031..4098.000 rows=1646436 loops=1
  174. Buffers: shared hit=144479
  175. Worker 15: actual time=0.023..5155.199 rows=2124740 loops=1
  176. Buffers: shared hit=186355
  177. Worker 16: actual time=0.024..5205.816 rows=2143293 loops=1
  178. Buffers: shared hit=188030
  179. Worker 17: actual time=0.024..4334.653 rows=1755948 loops=1
  180. Buffers: shared hit=154046
  181. Worker 18: actual time=0.030..4290.928 rows=1755944 loops=1
  182. Buffers: shared hit=154169
  183. Worker 19: actual time=0.025..5447.905 rows=2225151 loops=1
  184. Buffers: shared hit=194965
  185. Worker 20: actual time=0.034..4296.168 rows=1755935 loops=1
  186. Buffers: shared hit=153910
  187. Worker 21: actual time=0.025..4273.822 rows=1755934 loops=1
  188. Buffers: shared hit=154136
  189. Worker 22: actual time=0.026..5437.385 rows=2251317 loops=1
  190. Buffers: shared hit=197432
  191. Worker 23: actual time=0.027..4039.679 rows=1656889 loops=1
  192. Buffers: shared hit=145444
  193. Planning Time: 0.230 ms
  194. Execution Time: 14383.790 ms
  195. (193 rows)

非并行

  1. set max_parallel_workers_per_gather =0;
  2. HashAggregate (cost=13253090.64..13253093.36 rows=121 width=200) (actual time=135995.544..135996.426 rows=110 loops=1)
  3. Output: c33, c64, count(*), avg(c33), min(c34), max(c35), stddev(c36), sum(c37), stddev(GREATEST(c37, c38)), sum(CASE c38 WHEN 0 THEN c39 WHEN 1 THEN c40 WHEN 2 THEN c41 WHEN 3 THEN c42 WHEN 4 THEN c43 WHEN 5 THEN c44 WHEN 6 THEN c45 WHEN 7 THEN c46 WHEN 8 THEN c47 WHEN 9 THEN c48 WHEN 10 THEN c49 WHEN 11 THEN c50 WHEN 12 THEN c51 WHEN 13 THEN c52 WHEN 14 THEN c53 WHEN 15 THEN c54 WHEN 16 THEN c55 ELSE c56 END)
  4. Group Key: test.c33, test.c64
  5. Buffers: shared hit=4166668
  6. -> Seq Scan on public.test (cost=0.00..13166670.88 rows=1234568 width=200) (actual time=0.026..101059.780 rows=47502283 loops=1)
  7. Output: id, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c21, c22, c23, c24, c25, c26, c27, c28, c29, c30, c31, c32, c33, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56, c57, c58, c59, c60, c61, c62, c63, c64
  8. Filter: ((((test.c64)::double precision ^ '2'::double precision) > '0'::double precision) AND (sqrt((test.c64)::double precision) > '0'::double precision) AND (hashtext(test.c1) > hashtext(test.c2)) AND (pow(sqrt((abs(hashint8(CASE test.c38 WHEN 0 THEN test.c39 WHEN 1 THEN test.c40 WHEN 2 THEN test.c41 WHEN 3 THEN test.c42 WHEN 4 THEN test.c43 WHEN 5 THEN test.c44 WHEN 6 THEN test.c45 WHEN 7 THEN test.c46 WHEN 8 THEN test.c47 WHEN 9 THEN test.c48 WHEN 10 THEN test.c49 WHEN 11 THEN test.c50 WHEN 12 THEN test.c51 WHEN 13 THEN test.c52 WHEN 14 THEN test.c53 WHEN 15 THEN test.c54 WHEN 16 THEN test.c55 ELSE test.c56 END)))::double precision), '2'::double precision) > '0'::double precision))
  9. Rows Removed by Filter: 52497717
  10. Buffers: shared hit=4166668
  11. Planning Time: 0.202 ms
  12. Execution Time: 135996.540 ms
  13. (11 rows)

4、少量表达式性能对比

  1. explain select count(*) from test;
  2. QUERY PLAN
  3. -------------------------------------------------------------------------------------------
  4. Finalize Aggregate (cost=4218751.42..4218751.43 rows=1 width=8)
  5. -> Gather (cost=4218751.35..4218751.36 rows=24 width=8)
  6. Workers Planned: 24
  7. -> Partial Aggregate (cost=4218751.35..4218751.36 rows=1 width=8)
  8. -> Parallel Seq Scan on test (cost=0.00..4208334.68 rows=4166668 width=0)
  9. JIT:
  10. Functions: 5
  11. Inlining: true
  12. Optimization: true
  13. (9 rows)

4.1、开启JIT

set jit=on;  

并行

  1. set max_parallel_workers_per_gather =24;
  2. Finalize Aggregate (cost=4218751.42..4218751.43 rows=1 width=8) (actual time=1195.630..1195.630 rows=1 loops=1)
  3. Output: count(*)
  4. Buffers: shared hit=171976
  5. -> Gather (cost=4218751.35..4218751.36 rows=24 width=8) (actual time=1195.599..1195.616 rows=25 loops=1)
  6. Output: (PARTIAL count(*))
  7. Workers Planned: 24
  8. Workers Launched: 24
  9. Buffers: shared hit=171976
  10. -> Partial Aggregate (cost=4218751.35..4218751.36 rows=1 width=8) (actual time=1172.856..1172.856 rows=1 loops=25)
  11. Output: PARTIAL count(*)
  12. Buffers: shared hit=4166668
  13. Worker 0: actual time=1171.634..1171.634 rows=1 loops=1
  14. Buffers: shared hit=166655
  15. Worker 1: actual time=1171.631..1171.631 rows=1 loops=1
  16. Buffers: shared hit=166878
  17. Worker 2: actual time=1171.533..1171.533 rows=1 loops=1
  18. Buffers: shared hit=166588
  19. Worker 3: actual time=1171.634..1171.634 rows=1 loops=1
  20. Buffers: shared hit=166659
  21. Worker 4: actual time=1172.296..1172.296 rows=1 loops=1
  22. Buffers: shared hit=165127
  23. Worker 5: actual time=1171.637..1171.637 rows=1 loops=1
  24. Buffers: shared hit=166191
  25. Worker 6: actual time=1172.351..1172.351 rows=1 loops=1
  26. Buffers: shared hit=166251
  27. Worker 7: actual time=1171.634..1171.634 rows=1 loops=1
  28. Buffers: shared hit=166945
  29. Worker 8: actual time=1172.317..1172.317 rows=1 loops=1
  30. Buffers: shared hit=166515
  31. Worker 9: actual time=1171.632..1171.632 rows=1 loops=1
  32. Buffers: shared hit=166900
  33. Worker 10: actual time=1172.379..1172.379 rows=1 loops=1
  34. Buffers: shared hit=167330
  35. Worker 11: actual time=1171.555..1171.555 rows=1 loops=1
  36. Buffers: shared hit=165973
  37. Worker 12: actual time=1172.369..1172.369 rows=1 loops=1
  38. Buffers: shared hit=167326
  39. Worker 13: actual time=1172.361..1172.361 rows=1 loops=1
  40. Buffers: shared hit=167240
  41. Worker 14: actual time=1171.603..1171.603 rows=1 loops=1
  42. Buffers: shared hit=165948
  43. Worker 15: actual time=1172.308..1172.308 rows=1 loops=1
  44. Buffers: shared hit=167271
  45. Worker 16: actual time=1171.627..1171.627 rows=1 loops=1
  46. Buffers: shared hit=166118
  47. Worker 17: actual time=1172.363..1172.363 rows=1 loops=1
  48. Buffers: shared hit=166540
  49. Worker 18: actual time=1171.627..1171.628 rows=1 loops=1
  50. Buffers: shared hit=161783
  51. Worker 19: actual time=1172.367..1172.367 rows=1 loops=1
  52. Buffers: shared hit=167241
  53. Worker 20: actual time=1171.622..1171.622 rows=1 loops=1
  54. Buffers: shared hit=166895
  55. Worker 21: actual time=1172.310..1172.311 rows=1 loops=1
  56. Buffers: shared hit=166291
  57. Worker 22: actual time=1171.634..1171.634 rows=1 loops=1
  58. Buffers: shared hit=166817
  59. Worker 23: actual time=1172.357..1172.357 rows=1 loops=1
  60. Buffers: shared hit=167210
  61. -> Parallel Seq Scan on public.test (cost=0.00..4208334.68 rows=4166668 width=0) (actual time=0.005..756.021 rows=4000000 loops=25)
  62. Output: id, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c21, c22, c23, c24, c25, c26, c27, c28, c29, c30, c31, c32, c33, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56, c57, c58, c59, c60, c61, c62, c63, c64
  63. Buffers: shared hit=4166668
  64. Worker 0: actual time=0.005..752.685 rows=3999720 loops=1
  65. Buffers: shared hit=166655
  66. Worker 1: actual time=0.005..753.934 rows=4005072 loops=1
  67. Buffers: shared hit=166878
  68. Worker 2: actual time=0.006..752.288 rows=3998112 loops=1
  69. Buffers: shared hit=166588
  70. Worker 3: actual time=0.005..749.831 rows=3999816 loops=1
  71. Buffers: shared hit=166659
  72. Worker 4: actual time=0.005..754.798 rows=3963048 loops=1
  73. Buffers: shared hit=165127
  74. Worker 5: actual time=0.006..753.153 rows=3988584 loops=1
  75. Buffers: shared hit=166191
  76. Worker 6: actual time=0.005..755.385 rows=3990024 loops=1
  77. Buffers: shared hit=166251
  78. Worker 7: actual time=0.005..754.636 rows=4006667 loops=1
  79. Buffers: shared hit=166945
  80. Worker 8: actual time=0.005..757.278 rows=3996360 loops=1
  81. Buffers: shared hit=166515
  82. Worker 9: actual time=0.005..752.936 rows=4005600 loops=1
  83. Buffers: shared hit=166900
  84. Worker 10: actual time=0.006..755.266 rows=4015920 loops=1
  85. Buffers: shared hit=167330
  86. Worker 11: actual time=0.005..755.747 rows=3983352 loops=1
  87. Buffers: shared hit=165973
  88. Worker 12: actual time=0.005..755.743 rows=4015824 loops=1
  89. Buffers: shared hit=167326
  90. Worker 13: actual time=0.005..756.052 rows=4013742 loops=1
  91. Buffers: shared hit=167240
  92. Worker 14: actual time=0.006..755.470 rows=3982752 loops=1
  93. Buffers: shared hit=165948
  94. Worker 15: actual time=0.005..755.769 rows=4014504 loops=1
  95. Buffers: shared hit=167271
  96. Worker 16: actual time=0.006..755.859 rows=3986832 loops=1
  97. Buffers: shared hit=166118
  98. Worker 17: actual time=0.005..757.579 rows=3996959 loops=1
  99. Buffers: shared hit=166540
  100. Worker 18: actual time=0.005..756.139 rows=3882792 loops=1
  101. Buffers: shared hit=161783
  102. Worker 19: actual time=0.005..755.706 rows=4013784 loops=1
  103. Buffers: shared hit=167241
  104. Worker 20: actual time=0.005..753.655 rows=4005480 loops=1
  105. Buffers: shared hit=166895
  106. Worker 21: actual time=0.005..757.511 rows=3990984 loops=1
  107. Buffers: shared hit=166291
  108. Worker 22: actual time=0.005..753.577 rows=4003608 loops=1
  109. Buffers: shared hit=166817
  110. Worker 23: actual time=0.005..755.443 rows=4013040 loops=1
  111. Buffers: shared hit=167210
  112. Planning Time: 0.192 ms
  113. JIT:
  114. Functions: 5
  115. Generation Time: 1.082 ms
  116. Inlining: true
  117. Inlining Time: 2.907 ms
  118. Optimization: true
  119. Optimization Time: 23.873 ms
  120. Emission Time: 12.577 ms
  121. Execution Time: 1211.337 ms
  122. (120 rows)

非并行

  1. set max_parallel_workers_per_gather =0;
  2. Aggregate (cost=5416668.40..5416668.41 rows=1 width=8) (actual time=26222.597..26222.598 rows=1 loops=1)
  3. Output: count(*)
  4. Buffers: shared hit=4166668
  5. -> Seq Scan on public.test (cost=0.00..5166668.32 rows=100000032 width=0) (actual time=0.016..17338.548 rows=100000000 loops=1)
  6. Output: id, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c21, c22, c23, c24, c25, c26, c27, c28, c29, c30, c31, c32, c33, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56, c57, c58, c59, c60, c61, c62, c63, c64
  7. Buffers: shared hit=4166668
  8. Planning Time: 0.056 ms
  9. JIT:
  10. Functions: 2
  11. Generation Time: 0.457 ms
  12. Inlining: true
  13. Inlining Time: 1.355 ms
  14. Optimization: true
  15. Optimization Time: 8.269 ms
  16. Emission Time: 5.354 ms
  17. Execution Time: 26223.178 ms
  18. (16 rows)

4.2、关闭JIT

set jit=off;  

并行

  1. set max_parallel_workers_per_gather =24;
  2. Finalize Aggregate (cost=4218751.42..4218751.43 rows=1 width=8) (actual time=1166.951..1166.951 rows=1 loops=1)
  3. Output: count(*)
  4. Buffers: shared hit=170684
  5. -> Gather (cost=4218751.35..4218751.36 rows=24 width=8) (actual time=1166.923..1166.939 rows=25 loops=1)
  6. Output: (PARTIAL count(*))
  7. Workers Planned: 24
  8. Workers Launched: 24
  9. Buffers: shared hit=170684
  10. -> Partial Aggregate (cost=4218751.35..4218751.36 rows=1 width=8) (actual time=1146.387..1146.388 rows=1 loops=25)
  11. Output: PARTIAL count(*)
  12. Buffers: shared hit=4166668
  13. Worker 0: actual time=1145.584..1145.584 rows=1 loops=1
  14. Buffers: shared hit=167586
  15. Worker 1: actual time=1145.605..1145.605 rows=1 loops=1
  16. Buffers: shared hit=167235
  17. Worker 2: actual time=1145.517..1145.517 rows=1 loops=1
  18. Buffers: shared hit=164055
  19. Worker 3: actual time=1145.537..1145.537 rows=1 loops=1
  20. Buffers: shared hit=166048
  21. Worker 4: actual time=1145.625..1145.626 rows=1 loops=1
  22. Buffers: shared hit=165348
  23. Worker 5: actual time=1145.621..1145.621 rows=1 loops=1
  24. Buffers: shared hit=167426
  25. Worker 6: actual time=1145.746..1145.746 rows=1 loops=1
  26. Buffers: shared hit=166945
  27. Worker 7: actual time=1145.707..1145.707 rows=1 loops=1
  28. Buffers: shared hit=166628
  29. Worker 8: actual time=1143.795..1143.796 rows=1 loops=1
  30. Buffers: shared hit=163445
  31. Worker 9: actual time=1145.555..1145.556 rows=1 loops=1
  32. Buffers: shared hit=166960
  33. Worker 10: actual time=1145.605..1145.605 rows=1 loops=1
  34. Buffers: shared hit=166142
  35. Worker 11: actual time=1145.633..1145.633 rows=1 loops=1
  36. Buffers: shared hit=166090
  37. Worker 12: actual time=1145.553..1145.553 rows=1 loops=1
  38. Buffers: shared hit=168115
  39. Worker 13: actual time=1145.819..1145.819 rows=1 loops=1
  40. Buffers: shared hit=167973
  41. Worker 14: actual time=1145.615..1145.615 rows=1 loops=1
  42. Buffers: shared hit=166419
  43. Worker 15: actual time=1145.558..1145.558 rows=1 loops=1
  44. Buffers: shared hit=168061
  45. Worker 16: actual time=1145.527..1145.527 rows=1 loops=1
  46. Buffers: shared hit=166438
  47. Worker 17: actual time=1146.062..1146.062 rows=1 loops=1
  48. Buffers: shared hit=166993
  49. Worker 18: actual time=1145.629..1145.629 rows=1 loops=1
  50. Buffers: shared hit=162480
  51. Worker 19: actual time=1146.053..1146.054 rows=1 loops=1
  52. Buffers: shared hit=167989
  53. Worker 20: actual time=1144.864..1144.864 rows=1 loops=1
  54. Buffers: shared hit=164403
  55. Worker 21: actual time=1145.754..1145.754 rows=1 loops=1
  56. Buffers: shared hit=167957
  57. Worker 22: actual time=1145.625..1145.625 rows=1 loops=1
  58. Buffers: shared hit=167457
  59. Worker 23: actual time=1146.064..1146.064 rows=1 loops=1
  60. Buffers: shared hit=167791
  61. -> Parallel Seq Scan on public.test (cost=0.00..4208334.68 rows=4166668 width=0) (actual time=0.005..756.396 rows=4000000 loops=25)
  62. Output: id, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c21, c22, c23, c24, c25, c26, c27, c28, c29, c30, c31, c32, c33, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56, c57, c58, c59, c60, c61, c62, c63, c64
  63. Buffers: shared hit=4166668
  64. Worker 0: actual time=0.005..756.202 rows=4022064 loops=1
  65. Buffers: shared hit=167586
  66. Worker 1: actual time=0.005..755.320 rows=4013627 loops=1
  67. Buffers: shared hit=167235
  68. Worker 2: actual time=0.005..744.106 rows=3937320 loops=1
  69. Buffers: shared hit=164055
  70. Worker 3: actual time=0.005..751.125 rows=3985152 loops=1
  71. Buffers: shared hit=166048
  72. Worker 4: actual time=0.005..753.618 rows=3968352 loops=1
  73. Buffers: shared hit=165348
  74. Worker 5: actual time=0.005..756.612 rows=4018224 loops=1
  75. Buffers: shared hit=167426
  76. Worker 6: actual time=0.004..758.326 rows=4006680 loops=1
  77. Buffers: shared hit=166945
  78. Worker 7: actual time=0.005..759.780 rows=3999071 loops=1
  79. Buffers: shared hit=166628
  80. Worker 8: actual time=0.006..756.676 rows=3922680 loops=1
  81. Buffers: shared hit=163445
  82. Worker 9: actual time=0.005..751.844 rows=4007040 loops=1
  83. Buffers: shared hit=166960
  84. Worker 10: actual time=0.005..749.207 rows=3987408 loops=1
  85. Buffers: shared hit=166142
  86. Worker 11: actual time=0.006..757.084 rows=3986160 loops=1
  87. Buffers: shared hit=166090
  88. Worker 12: actual time=0.004..756.443 rows=4034760 loops=1
  89. Buffers: shared hit=168115
  90. Worker 13: actual time=0.005..756.966 rows=4031352 loops=1
  91. Buffers: shared hit=167973
  92. Worker 14: actual time=0.005..759.423 rows=3994056 loops=1
  93. Buffers: shared hit=166419
  94. Worker 15: actual time=0.004..756.996 rows=4033464 loops=1
  95. Buffers: shared hit=168061
  96. Worker 16: actual time=0.006..759.134 rows=3994512 loops=1
  97. Buffers: shared hit=166438
  98. Worker 17: actual time=0.006..758.454 rows=4007814 loops=1
  99. Buffers: shared hit=166993
  100. Worker 18: actual time=0.005..758.647 rows=3899520 loops=1
  101. Buffers: shared hit=162480
  102. Worker 19: actual time=0.005..756.724 rows=4031736 loops=1
  103. Buffers: shared hit=167989
  104. Worker 20: actual time=0.006..755.985 rows=3945672 loops=1
  105. Buffers: shared hit=164403
  106. Worker 21: actual time=0.004..755.998 rows=4030968 loops=1
  107. Buffers: shared hit=167957
  108. Worker 22: actual time=0.006..757.390 rows=4018968 loops=1
  109. Buffers: shared hit=167457
  110. Worker 23: actual time=0.006..756.996 rows=4026984 loops=1
  111. Buffers: shared hit=167791
  112. Planning Time: 0.070 ms
  113. Execution Time: 1172.569 ms
  114. (112 rows)

非并行

  1. set max_parallel_workers_per_gather =0;
  2. Aggregate (cost=5416668.40..5416668.41 rows=1 width=8) (actual time=26976.475..26976.476 rows=1 loops=1)
  3. Output: count(*)
  4. Buffers: shared hit=4166668
  5. -> Seq Scan on public.test (cost=0.00..5166668.32 rows=100000032 width=0) (actual time=0.011..17349.593 rows=100000000 loops=1)
  6. Output: id, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c21, c22, c23, c24, c25, c26, c27, c28, c29, c30, c31, c32, c33, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56, c57, c58, c59, c60, c61, c62, c63, c64
  7. Buffers: shared hit=4166668
  8. Planning Time: 0.053 ms
  9. Execution Time: 26976.515 ms
  10. (8 rows)

小结

目前观察到bitcode的代码是每个worker都会产生一份,这个在实现bitcode cache后应该也是可以共用一份的。

  1. 开启bc代码的DUMP
  2. set jit_dump_bitcode =on;
  3. 注意清理,bc文件不会自动清理。
  1. cd $PGDATA
  2. -rw------- 1 digoal digoal 126K Apr 4 12:59 34026.0.optimized.bc
  3. -rw------- 1 digoal digoal 126K Apr 4 12:59 34021.0.optimized.bc
  4. -rw------- 1 digoal digoal 126K Apr 4 12:59 34019.0.optimized.bc
  5. -rw------- 1 digoal digoal 126K Apr 4 12:59 34024.0.optimized.bc
  6. -rw------- 1 digoal digoal 126K Apr 4 12:59 34023.0.optimized.bc
  7. -rw------- 1 digoal digoal 126K Apr 4 12:59 34017.0.optimized.bc
  8. -rw------- 1 digoal digoal 126K Apr 4 12:59 34015.0.optimized.bc
  9. -rw------- 1 digoal digoal 126K Apr 4 12:59 34013.0.optimized.bc
  10. -rw------- 1 digoal digoal 126K Apr 4 12:59 34020.0.optimized.bc
  11. -rw------- 1 digoal digoal 126K Apr 4 12:59 34016.0.optimized.bc
  12. -rw------- 1 digoal digoal 126K Apr 4 12:59 34033.0.optimized.bc
  13. -rw------- 1 digoal digoal 126K Apr 4 12:59 34031.0.optimized.bc
  14. -rw------- 1 digoal digoal 126K Apr 4 12:59 34018.0.optimized.bc
  15. -rw------- 1 digoal digoal 126K Apr 4 12:59 34014.0.optimized.bc
  16. -rw------- 1 digoal digoal 126K Apr 4 12:59 34036.0.optimized.bc
  17. -rw------- 1 digoal digoal 126K Apr 4 12:59 34035.0.optimized.bc
  18. -rw------- 1 digoal digoal 126K Apr 4 12:59 34030.0.optimized.bc
  19. -rw------- 1 digoal digoal 126K Apr 4 12:59 34028.0.optimized.bc
  20. -rw------- 1 digoal digoal 126K Apr 4 12:59 34032.0.optimized.bc
  21. -rw------- 1 digoal digoal 126K Apr 4 12:59 34029.0.optimized.bc
  22. -rw------- 1 digoal digoal 126K Apr 4 12:59 34022.0.optimized.bc
  23. -rw------- 1 digoal digoal 126K Apr 4 12:59 34025.0.optimized.bc
  24. -rw------- 1 digoal digoal 126K Apr 4 12:59 34027.0.optimized.bc

使用llvm-bcanalyzer可以查看bitcode的内容

https://www.llvm.org/docs/CommandGuide/llvm-bcanalyzer.html

  1. /usr/local/llvm6.0.0/bin/llvm-bcanalyzer ./40818.7.bc
  2. Summary of ./40818.7.bc:
  3. Total size: 1069440b/133680.00B/33420W
  4. Stream type: LLVM IR
  5. # Toplevel Blocks: 4
  6. Per-block Summary:
  7. Block ID #0 (BLOCKINFO_BLOCK):
  8. Num Instances: 1
  9. Total Size: 672b/84.00B/21W
  10. Percent of file: 0.0628%
  11. Num SubBlocks: 0
  12. Num Abbrevs: 16
  13. Num Records: 3
  14. Percent Abbrevs: 0.0000%
  15. Record Histogram:
  16. Count # Bits b/Rec % Abv Record Kind
  17. 3 60 20.0 SETBID
  18. Block ID #8 (MODULE_BLOCK):
  19. Num Instances: 1
  20. Total Size: 18530b/2316.25B/579W
  21. Percent of file: 1.7327%
  22. Num SubBlocks: 67
  23. Num Abbrevs: 3
  24. Num Records: 129
  25. Percent Abbrevs: 2.3256%
  26. Record Histogram:
  27. Count # Bits b/Rec % Abv Record Kind
  28. 86 11904 138.4 FUNCTION
  29. 38 4813 126.7 2.63 GLOBALVAR
  30. 1 21 100.00 SOURCE_FILENAME
  31. 1 35 100.00 VSTOFFSET
  32. 1 465 DATALAYOUT
  33. 1 303 TRIPLE
  34. 1 21 VERSION
  35. ........

参考

http://blog.jobbole.com/113684/

https://www.postgresql.org/docs/devel/static/jit-extensibility.html

https://www.postgresql.org/docs/devel/static/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS

https://www.postgresql.org/docs/devel/static/runtime-config-client.html#GUC-JIT-PROVIDER

https://www.postgresql.org/docs/devel/static/runtime-config-query.html#GUC-JIT-ABOVE-COST

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/jit/README;h=6271677163ac8acf9767370f3fff39b45f575dc2;hb=refs/heads/master

Using LLVM For Program Transformation

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小蓝xlanll/article/detail/573062
推荐阅读
相关标签
  

闽ICP备14008679号