赞
踩
Zstandard(或Zstd)是由Facebook的Yann Collet开发的一个无损数据压缩算法,Zstandard在设计上与DEFLATE(.zip、gzip)算法有着差不多的压缩比,但有更高的压缩和解压缩速度。在其官网(https://github.com/facebook/zstd)给出的性能测试中,Zstandard比snappy、lzo等算法有较高的优势。
Compressor name | Ratio | Compression | Decompress. |
zstd 1.4.5 -1 | 2.884 | 500 MB/s | 1660 MB/s |
zlib 1.2.11 -1 | 2.743 | 90 MB/s | 400 MB/s |
brotli 1.0.7 -0 | 2.703 | 400 MB/s | 450 MB/s |
zstd 1.4.5 --fast=1 | 2.434 | 570 MB/s | 2200 MB/s |
zstd 1.4.5 --fast=3 | 2.312 | 640 MB/s | 2300 MB/s |
quicklz 1.5.0 -1 | 2.238 | 560 MB/s | 710 MB/s |
zstd 1.4.5 --fast=5 | 2.178 | 700 MB/s | 2420 MB/s |
lzo1x 2.10 -1 | 2.106 | 690 MB/s | 820 MB/s |
lz4 1.9.2 | 2.101 | 740 MB/s | 4530 MB/s |
zstd 1.4.5 --fast=7 | 2.096 | 750 MB/s | 2480 MB/s |
lzf 3.6 -1 | 2.077 | 410 MB/s | 860 MB/s |
snappy 1.1.8 | 2.073 | 560 MB/s | 1790 MB/s |
Zstd算法可以通过参数--fast来权衡压缩比与解压缩速度。解压速度越高,压缩比约低。Hive3.1.1中Orc默认采用zlib作为压缩算法(OrcConfig类中orc.compress参数指定),parquet格式默认不压缩。Zstd在最高压缩率的情况下,其压缩速度是zlib的5.56倍,解压速度是其4.15倍。所以如果hive的orc和parquet格式默认采用zstd算法,那么在hive的map读数据阶段,可以极大的减少数据解压耗时,在reduce阶段,减少数据压缩的耗时,在整体上可以提升hive的性能。
HADOOP-13578(https://issues.apache.org/jira/browse/HADOOP-13578) 在Hadoop3中增加了Zstd压缩本地库,需要依赖facebook的Zstd库。编译Hadoop时开启Zstd本地库编译的步骤如下:
1. 下载编译并安装Zstd依赖库
wget https://github.com/facebook/zstd/releases/download/v1.4.4/zstd-1.4.4.tar.gz tar -xzf zstd-1.4.4.tar.gz cd zstd-1.4.4 make && make install |
2. 编译Hadoop3时默认是不开启的,需要在maven参数中设置相关开启参数。
mvn clean package -Dzstd.lib=/usr/local/lib -Dbundle.zstd=true |
参数zstd.lib指向本地库中zstd依赖,使用bundle.zstd表示开启编译zstd,如果本地zstd库找不到,编译会失败。
ORC-363(https://jira.apache.org/jira/browse/ORC-363)增加了zStandard压缩算法,影响版本1.6。hive-3.1.1版本中使用orc-1.5.1,需要升级为orc-1.6.3(当前hive不支持orc-1.6)。
在hive中设置ORC格式的压缩算法有两种方式:1.建表时在TBLPROPERTIES中增加属性”orc.compress”=”ZSTD” ; 2.设置hive参数hive.exec.orc.default.compress=ZSTD。第一中方式需要对每张表进行设置,第二种方式是针对hive全局设置的,比较方便。因此在hive-site.xml中做如下的配置即可开启ORC的ZSTD压缩算法。
- <span style="color:#000000"><span style="color:#cccccc"><code class="language-javascript"><span style="color:#67cdcc"><</span>property<span style="color:#67cdcc">></span>
- <span style="color:#67cdcc"><</span>name<span style="color:#67cdcc">></span>hive<span style="color:#cccccc">.</span>exec<span style="color:#cccccc">.</span>orc<span style="color:#cccccc">.</span>default<span style="color:#cccccc">.</span>compress<span style="color:#67cdcc"><</span><span style="color:#67cdcc">/</span>name<span style="color:#67cdcc">></span>
- <span style="color:#67cdcc"><</span>value<span style="color:#67cdcc">></span><span style="color:#f8c555">ZSTD</span><span style="color:#67cdcc"><</span><span style="color:#67cdcc">/</span>value<span style="color:#67cdcc">></span>
- <span style="color:#67cdcc"><</span>description<span style="color:#67cdcc">></span>orc<span style="color:#67cdcc">-</span><span style="color:#f08d49">1.6</span><span style="color:#f08d49">.0</span>可选的值:<span style="color:#f8c555">NONE</span><span style="color:#cccccc">,</span><span style="color:#f8c555">ZLIB</span><span style="color:#cccccc">,</span><span style="color:#f8c555">SNAPPY</span><span style="color:#cccccc">,</span><span style="color:#f8c555">LZO</span><span style="color:#cccccc">,</span><span style="color:#f8c555">LZ4</span><span style="color:#cccccc">,</span><span style="color:#f8c555">ZSTD</span><span style="color:#67cdcc"><</span><span style="color:#67cdcc">/</span>description<span style="color:#67cdcc">></span>
- <span style="color:#67cdcc"><</span><span style="color:#67cdcc">/</span>property<span style="color:#67cdcc">></span></code></span></span>
Hive Parquet默认不采用压缩算法,有两种方式可以修改压缩算法:
1.在TBLPROPERTIES中设置参数”parquet.compression”=”zstd”;
2.设置Hadoop的参数来指定parquet压缩算法,
- <span style="color:#000000"><span style="color:#cccccc"><code class="language-javascript"><span style="color:#67cdcc"><</span>property<span style="color:#67cdcc">></span>
- <span style="color:#67cdcc"><</span>name<span style="color:#67cdcc">></span> mapreduce<span style="color:#cccccc">.</span>output<span style="color:#cccccc">.</span>fileoutputformat<span style="color:#cccccc">.</span>compress <span style="color:#67cdcc"><</span><span style="color:#67cdcc">/</span>name<span style="color:#67cdcc">></span>
- <span style="color:#67cdcc"><</span>value<span style="color:#67cdcc">></span><span style="color:#f08d49">true</span><span style="color:#67cdcc"><</span><span style="color:#67cdcc">/</span>value<span style="color:#67cdcc">></span>
- <span style="color:#67cdcc"><</span><span style="color:#67cdcc">/</span>property<span style="color:#67cdcc">></span>
- <span style="color:#67cdcc"><</span>property<span style="color:#67cdcc">></span>
- <span style="color:#67cdcc"><</span>name<span style="color:#67cdcc">></span> mapreduce<span style="color:#cccccc">.</span>output<span style="color:#cccccc">.</span>fileoutputformat<span style="color:#cccccc">.</span>compress<span style="color:#cccccc">.</span>codec <span style="color:#67cdcc"><</span><span style="color:#67cdcc">/</span>name<span style="color:#67cdcc">></span>
- <span style="color:#67cdcc"><</span>value<span style="color:#67cdcc">></span> org<span style="color:#cccccc">.</span>apache<span style="color:#cccccc">.</span>hadoop<span style="color:#cccccc">.</span>io<span style="color:#cccccc">.</span>compress<span style="color:#cccccc">.</span>ZStandardCodec<span style="color:#67cdcc"><</span><span style="color:#67cdcc">/</span>value<span style="color:#67cdcc">></span>
- <span style="color:#67cdcc"><</span><span style="color:#67cdcc">/</span>property<span style="color:#67cdcc">></span></code></span></span>
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。