赞
踩
作者:聂雄超 (kelvin)
测试结论
Star schema benchmark(以下简称SSB)是学术界和工业界广泛使用的一个星型模型测试集(来源论文),通过这个测试集合可以方便的对比各种OLAP产品的基础性能指标。Clickhouse 通过改写SSB,将星型模型打平转化成宽表,改造成了一个单表测试benchmark(参考链接)。本报告记录了StarRocks和Clickhouse在SSB单表数据集上的性能对比结果,并记录了在用户经常碰到的低基数聚合场景下StarRocks和ClickHouse的性能对比结果。测试结论如下:
硬件准备
机器 | 6台 服务器 |
---|---|
cpu | Intel® Xeon® Gold 5218 CPU @ 2.30GHz 32core ,64线程 |
内存 | 512GB |
网络带宽 | 20Gbits/s |
磁盘 | HDD盘(12*4TB) |
StarRocks和Clickhouse部署在相同配置的机器上分别进行启动测试。
内核版本:Linux 3.10.0-862.el7.x86_64
操作系统版本:Red Hat 4.8.5-28
软件版本: StarRocks2.0.1、ClickHouse 20.4.2.9
表名 | 行数 | 解释 |
---|---|---|
lineorder | 6亿 | SSB商品订单表 |
customer | 300万 | SSB客户表 |
part | 140万 | SSB 零部件表 |
supplier | 20万 | SSB 供应商表 |
dates | 2556 | 日期表 |
lineorder_flat | 6亿 | SSB打平后的宽表 |
测试SQL
use ssb; CREATE TABLE `lineorder` ( `lo_orderkey` int(11) NOT NULL COMMENT "", `lo_linenumber` int(11) NOT NULL COMMENT "", `lo_custkey` int(11) NOT NULL COMMENT "", `lo_partkey` int(11) NOT NULL COMMENT "", `lo_suppkey` int(11) NOT NULL COMMENT "", `lo_orderdate` int(11) NOT NULL COMMENT "", `lo_orderpriority` varchar(16) NOT NULL COMMENT "", `lo_shippriority` int(11) NOT NULL COMMENT "", `lo_quantity` int(11) NOT NULL COMMENT "", `lo_extendedprice` int(11) NOT NULL COMMENT "", `lo_ordtotalprice` int(11) NOT NULL COMMENT "", `lo_discount` int(11) NOT NULL COMMENT "", `lo_revenue` int(11) NOT NULL COMMENT "", `lo_supplycost` int(11) NOT NULL COMMENT "", `lo_tax` int(11) NOT NULL COMMENT "", `lo_commitdate` int(11) NOT NULL COMMENT "", `lo_shipmode` varchar(11) NOT NULL COMMENT "" ) ENGINE=OLAP DUPLICATE KEY(`lo_orderkey`) COMMENT "OLAP" PARTITION BY RANGE(`lo_orderdate`) (PARTITION p1 VALUES [("-2147483648"), ("19930101")), PARTITION p2 VALUES [("19930101"), ("19940101")), PARTITION p3 VALUES [("19940101"), ("19950101")), PARTITION p4 VALUES [("19950101"), ("19960101")), PARTITION p5 VALUES [("19960101"), ("19970101")), PARTITION p6 VALUES [("19970101"), ("19980101")), PARTITION p7 VALUES [("19980101"), ("19990101"))) DISTRIBUTED BY HASH(`lo_orderkey`) BUCKETS 48 PROPERTIES ( "replication_num" = "1" ); CREATE TABLE IF NOT EXISTS `customer` ( `c_custkey` int(11) NOT NULL COMMENT "", `c_name` varchar(26) NOT NULL COMMENT "", `c_address` varchar(41) NOT NULL COMMENT "", `c_city` varchar(11) NOT NULL COMMENT "", `c_nation` varchar(16) NOT NULL COMMENT "", `c_region` varchar(13) NOT NULL COMMENT "", `c_phone` varchar(16) NOT NULL COMMENT "", `c_mktsegment` varchar(11) NOT NULL COMMENT "" ) ENGINE=OLAP DUPLICATE KEY(`c_custkey`) COMMENT "OLAP" DISTRIBUTED BY HASH(`c_custkey`) BUCKETS 12 PROPERTIES ( "replication_num" = "1" ); CREATE TABLE IF NOT EXISTS `dates` ( `d_datekey` int(11) NOT NULL COMMENT "", `d_date` varchar(20) NOT NULL COMMENT "", `d_dayofweek` varchar(10) NOT NULL COMMENT "", `d_month` varchar(11) NOT NULL COMMENT "", `d_year` int(11) NOT NULL COMMENT "", `d_yearmonthnum` int(11) NOT NULL COMMENT "", `d_yearmonth` varchar(9) NOT NULL COMMENT "", `d_daynuminweek` int(11) NOT NULL COMMENT "", `d_daynuminmonth` int(11) NOT NULL COMMENT "", `d_daynuminyear` int(11) NOT NULL COMMENT "", `d_monthnuminyear` int(11) NOT NULL COMMENT "", `d_weeknuminyear` int(11) NOT NULL COMMENT "", `d_sellingseason` varchar(14) NOT NULL COMMENT "", `d_lastdayinweekfl` int(11) NOT NULL COMMENT "", `d_lastdayinmonthfl` int(11) NOT NULL COMMENT "", `d_holidayfl` int(11) NOT NULL COMMENT "", `d_weekdayfl` int(11) NOT NULL COMMENT "" ) ENGINE=OLAP DUPLICATE KEY(`d_datekey`) COMMENT "OLAP" DISTRIBUTED BY HASH(`d_datekey`) BUCKETS 1 PROPERTIES ( "replication_num" = "1" ); CREATE TABLE IF NOT EXISTS `supplier` ( `s_suppkey` int(11) NOT NULL COMMENT "", `s_name` varchar(26) NOT NULL COMMENT "", `s_address` varchar(26) NOT NULL COMMENT "", `s_city` varchar(11) NOT NULL COMMENT "", `s_nation` varchar(16) NOT NULL COMMENT "", `s_region` varchar(13) NOT NULL COMMENT "", `s_phone` varchar(16) NOT NULL COMMENT "" ) ENGINE=OLAP DUPLICATE KEY(`s_suppkey`) COMMENT "OLAP" DISTRIBUTED BY HASH(`s_suppkey`) BUCKETS 12 PROPERTIES ( "replication_num" = "1" ); CREATE TABLE IF NOT EXISTS `part` ( `p_partkey` int(11) NOT NULL COMMENT "", `p_name` varchar(23) NOT NULL COMMENT "", `p_mfgr` varchar(7) NOT NULL COMMENT "", `p_category` varchar(8) NOT NULL COMMENT "", `p_brand` varchar(10) NOT NULL COMMENT "", `p_color` varchar(12) NOT NULL COMMENT "", `p_type` varchar(26) NOT NULL COMMENT "", `p_size` int(11) NOT NULL COMMENT "", `p_container` varchar(11) NOT NULL COMMENT "" ) ENGINE=OLAP DUPLICATE KEY(`p_partkey`) COMMENT "OLAP" DISTRIBUTED BY HASH(`p_partkey`) BUCKETS 12 PROPERTIES ( "replication_num" = "1" ); CREATE TABLE `lineorder_flat` ( `LO_ORDERDATE` date NOT NULL COMMENT "", `LO_ORDERKEY` int(11) NOT NULL COMMENT "", `LO_LINENUMBER` tinyint(4) NOT NULL COMMENT "", `LO_CUSTKEY` int(11) NOT NULL COMMENT "", `LO_PARTKEY` int(11) NOT NULL COMMENT "", `LO_SUPPKEY` int(11) NOT NULL COMMENT "", `LO_ORDERPRIORITY` varchar(100) NOT NULL COMMENT "", `LO_SHIPPRIORITY` tinyint(4) NOT NULL COMMENT "", `LO_QUANTITY` tinyint(4) NOT NULL COMMENT "", `LO_EXTENDEDPRICE` int(11) NOT NULL COMMENT "", `LO_ORDTOTALPRICE` int(11) NOT NULL COMMENT "", `LO_DISCOUNT` tinyint(4) NOT NULL COMMENT "", `LO_REVENUE` int(11) NOT NULL COMMENT "", `LO_SUPPLYCOST` int(11) NOT NULL COMMENT "", `LO_TAX` tinyint(4) NOT NULL COMMENT "", `LO_COMMITDATE` date NOT NULL COMMENT "", `LO_SHIPMODE` varchar(100) NOT NULL COMMENT "", `C_NAME` varchar(100) NOT NULL COMMENT "", `C_ADDRESS` varchar(100) NOT NULL COMMENT "", `C_CITY` varchar(100) NOT NULL COMMENT "", `C_NATION` varchar(100) NOT NULL COMMENT "", `C_REGION` varchar(100) NOT NULL COMMENT "", `C_PHONE` varchar(100) NOT NULL COMMENT "", `C_MKTSEGMENT` varchar(100) NOT NULL COMMENT "", `S_NAME` varchar(100) NOT NULL COMMENT "", `S_ADDRESS` varchar(100) NOT NULL COMMENT "", `S_CITY` varchar(100) NOT NULL COMMENT "", `S_NATION` varchar(100) NOT NULL COMMENT "", `S_REGION` varchar(100) NOT NULL COMMENT "", `S_PHONE` varchar(100) NOT NULL COMMENT "", `P_NAME` varchar(100) NOT NULL COMMENT "", `P_MFGR` varchar(100) NOT NULL COMMENT "", `P_CATEGORY` varchar(100) NOT NULL COMMENT "", `P_BRAND` varchar(100) NOT NULL COMMENT "", `P_COLOR` varchar(100) NOT NULL COMMENT "", `P_TYPE` varchar(100) NOT NULL COMMENT "", `P_SIZE` tinyint(4) NOT NULL COMMENT "", `P_CONTAINER` varchar(100) NOT NULL COMMENT "" ) ENGINE=OLAP DUPLICATE KEY(`LO_ORDERDATE`, `LO_ORDERKEY`) COMMENT "OLAP" PARTITION BY RANGE(`LO_ORDERDATE`) (START ("1992-01-01") END ("1999-01-01") EVERY (INTERVAL 1 YEAR)) DISTRIBUTED BY HASH(`LO_ORDERKEY`) BUCKETS 48 PROPERTIES ( "replication_num" = "1" );
由于脚本为python3解释器执行,服务器版本为python2.7,因此需用修改为python2.7
# 下载测试脚本
wget https://starrocks-public.oss-cn-zhangjiakou.aliyuncs.com/ssb-poc-0.9.3.zip
unzip ssb-poc-0.9.3.zip
cd ssb-poc
make && make install
# 生成100G数据脚本
cd output
bin/gen-ssb.sh 100 data_dir
# 测试100G数据
bin/create_db_table.sh ddl_100
-- 默认为1,并行度的设置极大影响查询效率
# 设置并行度,建议是每个集群节点逻辑核数的一半,64线程设置为32,经测试再调大该参数,查询性能基本没有变化
set global parallel_fragment_exec_instance_num = 32;
--Q1.1 SELECT sum(lo_extendedprice * lo_discount) AS `revenue` FROM lineorder_flat WHERE lo_orderdate >= '1993-01-01' and lo_orderdate <= '1993-12-31' AND lo_discount BETWEEN 1 AND 3 AND lo_quantity < 25; --Q1.2 SELECT sum(lo_extendedprice * lo_discount) AS revenue FROM lineorder_flat WHERE lo_orderdate >= '1994-01-01' and lo_orderdate <= '1994-01-31' AND lo_discount BETWEEN 4 AND 6 AND lo_quantity BETWEEN 26 AND 35; --Q1.3 SELECT sum(lo_extendedprice * lo_discount) AS revenue FROM lineorder_flat WHERE weekofyear(lo_orderdate) = 6 AND lo_orderdate >= '1994-01-01' and lo_orderdate <= '1994-12-31' AND lo_discount BETWEEN 5 AND 7 AND lo_quantity BETWEEN 26 AND 35; --Q2.1 SELECT sum(lo_revenue), year(lo_orderdate) AS year, p_brand FROM lineorder_flat WHERE p_category = 'MFGR#12' AND s_region = 'AMERICA' GROUP BY year, p_brand ORDER BY year, p_brand; --Q2.2 SELECT sum(lo_revenue), year(lo_orderdate) AS year, p_brand FROM lineorder_flat WHERE p_brand >= 'MFGR#2221' AND p_brand <= 'MFGR#2228' AND s_region = 'ASIA' GROUP BY year, p_brand ORDER BY year, p_brand; --Q2.3 SELECT sum(lo_revenue), year(lo_orderdate) AS year, p_brand FROM lineorder_flat WHERE p_brand = 'MFGR#2239' AND s_region = 'EUROPE' GROUP BY year, p_brand ORDER BY year, p_brand; --Q3.1 SELECT c_nation, s_nation, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue FROM lineorder_flat WHERE c_region = 'ASIA' AND s_region = 'ASIA' AND lo_orderdate >= '1992-01-01' AND lo_orderdate <= '1997-12-31' GROUP BY c_nation, s_nation, year ORDER BY year ASC, revenue DESC; --Q3.2 SELECT c_city, s_city, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue FROM lineorder_flat WHERE c_nation = 'UNITED STATES' AND s_nation = 'UNITED STATES' AND lo_orderdate >= '1992-01-01' AND lo_orderdate <= '1997-12-31' GROUP BY c_city, s_city, year ORDER BY year ASC, revenue DESC; --Q3.3 SELECT c_city, s_city, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue FROM lineorder_flat WHERE c_city in ( 'UNITED KI1' ,'UNITED KI5') AND s_city in ( 'UNITED KI1' ,'UNITED KI5') AND lo_orderdate >= '1992-01-01' AND lo_orderdate <= '1997-12-31' GROUP BY c_city, s_city, year ORDER BY year ASC, revenue DESC; --Q3.4 SELECT c_city, s_city, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue FROM lineorder_flat WHERE c_city in ('UNITED KI1', 'UNITED KI5') AND s_city in ( 'UNITED KI1', 'UNITED KI5') AND lo_orderdate >= '1997-12-01' AND lo_orderdate <= '1997-12-31' GROUP BY c_city, s_city, year ORDER BY year ASC, revenue DESC; --Q4.1 SELECT year(lo_orderdate) AS year, c_nation, sum(lo_revenue - lo_supplycost) AS profit FROM lineorder_flat WHERE c_region = 'AMERICA' AND s_region = 'AMERICA' AND p_mfgr in ( 'MFGR#1' , 'MFGR#2') GROUP BY year, c_nation ORDER BY year ASC, c_nation ASC; --Q4.2 SELECT year(lo_orderdate) AS year, s_nation, p_category, sum(lo_revenue - lo_supplycost) AS profit FROM lineorder_flat WHERE c_region = 'AMERICA' AND s_region = 'AMERICA' AND lo_orderdate >= '1997-01-01' and lo_orderdate <= '1998-12-31' AND p_mfgr in ( 'MFGR#1' , 'MFGR#2') GROUP BY year, s_nation, p_category ORDER BY year ASC, s_nation ASC, p_category ASC; --Q4.3 SELECT year(lo_orderdate) AS year, s_city, p_brand, sum(lo_revenue - lo_supplycost) AS profit FROM lineorder_flat WHERE s_nation = 'UNITED STATES' AND lo_orderdate >= '1997-01-01' and lo_orderdate <= '1998-12-31' AND p_category = 'MFGR#14' GROUP BY year, s_city, p_brand ORDER BY year ASC, s_city ASC, p_brand ASC;
--Q1 select count(*),lo_shipmode from lineorder_flat group by lo_shipmode; --Q2 select count(distinct lo_shipmode) from lineorder_flat; --Q3 select count(*),lo_shipmode,lo_orderpriority from lineorder_flat group by lo_shipmode,lo_orderpriority; --Q4 select count(*),lo_shipmode,lo_orderpriority from lineorder_flat group by lo_shipmode,lo_orderpriority,lo_shippriority; --Q5 select count(*),lo_shipmode,s_city from lineorder_flat group by lo_shipmode,s_city; --Q6 select count(*) from lineorder_flat group by c_city,s_city; --Q7 select count(*) from lineorder_flat group by lo_shipmode,lo_orderdate; --Q8 select count(*) from lineorder_flat group by lo_orderdate,s_nation,s_region; --Q9 select count(*) from lineorder_flat group by c_city,s_city,c_nation,s_nation; --Q10 select count(*) from (select count(*) from lineorder_flat group by lo_shipmode,lo_orderpriority,p_category,s_nation,c_nation) t; --Q11 select count(*) from (select count(*) from lineorder_flat_distributed group by lo_shipmode,lo_orderpriority,p_category,s_nation,c_nation,p_mfgr) t; --Q12 select count(*) from (select count(*) from lineorder_flat group by substr(lo_shipmode,2),lower(lo_orderpriority),p_category,s_nation,c_nation,s_region,p_mfgr) t;
单表测试
starrocks 2.0.1(ms) | clickhouse 20.4.2.9 (ms) | clickhouse/starrocks 性能对比 | |
---|---|---|---|
Q1.1 | 70 | 27 | 0.39 |
Q1.2 | 20 | 21 | 1.05 |
Q1.3 | 50 | 18 | 0.36 |
Q2.1 | 90 | 376 | 4.18 |
Q2.2 | 80 | 309 | 3.86 |
Q2.3 | 70 | 481 | 6.87 |
Q3.1 | 200 | 792 | 3.96 |
Q3.2 | 80 | 848 | 10.60 |
Q3.3 | 60 | 622 | 10.37 |
Q3.4 | 20 | 33 | 1.65 |
Q4.1 | 130 | 919 | 7.07 |
Q4.2 | 110 | 441 | 4.01 |
Q4.3 | 80 | 295 | 3.69 |
sum | 1060 | 5182 | 4.89 |
低基数聚合测试结果
查询类型 | 结果集的基数 | starrocks 2.0.1(s)并行度=1 | starrocks 2.0.1(s)并行度=32 | clickhouse 20.4.2.9 (s) | clickhouse/starrocks 性能对比 | |
---|---|---|---|---|---|---|
Q1 | group by 1个低基数列(<50) | 7 | 1.88 | 0.25 | 0.199 | 0.80 |
Q2 | count distinct 1个低基数列(<50) | 1 | 1.56 | 0.21 | 0.365 | 1.74 |
Q3 | group by 2个低基数列 | 35 | 2.88 | 0.28 | 2.732 | 9.76 |
Q4 | group by 2个低基数列,一个int列 | 35 | 2.99 | 0.32 | 3.465 | 10.83 |
Q5 | group by 4个低基数列(7*250) | 1750 | 2.87 | 0.32 | 0.996 | 3.11 |
Q6 | group by 2个低基数列(250*250) | 62500 | 3.74 | 0.83 | 1.947 | 2.35 |
Q7 | group by 1个低基数列(<50)和1个日期列 | 16842 | 2.09 | 0.28 | 0.656 | 2.34 |
Q8 | group by 2个低基数列(<50)和2个日期列 | 60150 | 2.08 | 0.42 | 0.978 | 2.33 |
Q9 | group by 4个低基数列 | 62500 | 7.14 | 1.27 | 3.308 | 2.60 |
Q10 | group by 5个低基数列(<50) | 546875 | 19.22 | 2.9 | 4.46 | 1.54 |
Q11 | group by 6个低基数列(<50) | 546875 | 24.81 | 3.08 | 5.254 | 1.71 |
Q12 | group by 7个包含函数计算低基数列(<50) | 469750 | 28.99 | 3.57 | 5.868 | 1.64 |
sum | 100.25 | 13.73 | 30.228 | 2.20 |
Clickhouse SSB测试报告:Clickhouse SSB测试报告
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。