当前位置:   article > 正文

ClickHouse 集群部署(不需要 Zookeeper)_clickhouse必须使用zookeeper吗

clickhouse必须使用zookeeper吗

目录

一、单节点设置

1. 下载

2. 安装

3. 启动

4. 验证

二、导入示例数据集

1. 下载并提取表数据

2. 创建库表

3. 导入数据

4. 优化表

5. 查询示例

三、集群部署

0. 安装前准备

1. 安装配置 ClickHouse Keeper

(1)安装 ClickHouse Server 和 ClickHouse Client

(3)重启 ClickHouse

(4)验证 Keeper 实例是否正在运行

(5)确认 zookeeper 系统表

2. 配置 ClickHouse 集群

(1)在新主机(node4)上安装 ClickHouse Server 和 ClickHouse Client

(2)修改新主机(node4)上的配置文件

(3)更新配置

(4)重启 ClickHouse 并验证集群已创建

3. 创建分布式表

(1)创建数据库

(2)创建本地表

(3)在一个分片实例上(node1)新增两行

(4)在另一个分片实例上(node3)新增两行

(5)在所有实例上分别执行 SELECT 查询

(6)创建分布式表

(7)在所有实例上分别执行 SELECT 查询

4. 数据自动分片

(1)创建带有分片键的本地表

(2)创建提供集群本地表视图的分布式表

(3)向分布式表中插入数据

(4)在所有实例上分别执行 SELECT 查询

5. 使用 ReplicatedMergeTree 表引擎复制数据

(1)定义宏

(2)创建本地表

(3)创建分布式表

(4)向分布式表中插入数据

(5)在所有实例上分别执行 SELECT 查询

参考:


        ClickHouse 可以在任何具有x86_64、AArch64 或 PowerPC64LE CPU 架构的 Linux,FreeBSD 或 Mac OS X 上运行。官方预构建的二进制文件通常针对 x86_64 进行编译,并利用 SSE 4.2 指令集,因此,除非另有说明,支持它的 CPU 使用将成为额外的系统需求。下面是检查当前 CPU 是否支持 SSE 4.2 的命令:

$ grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"

一、单节点设置

1. 下载

  1. # 查看版本
  2. https://github.com/ClickHouse/ClickHouse/releases
  3. # 最新稳定版本安装包下载地址
  4. https://github.com/ClickHouse/ClickHouse/releases/download/v24.1.8.22-stable/clickhouse-client-24.1.8.22.x86_64.rpm
  5. https://github.com/ClickHouse/ClickHouse/releases/download/v24.1.8.22-stable/clickhouse-common-static-24.1.8.22.x86_64.rpm
  6. https://github.com/ClickHouse/ClickHouse/releases/download/v24.1.8.22-stable/clickhouse-server-24.1.8.22.x86_64.rpm

2. 安装

  1. # 首先安装通用依赖包
  2. rpm -ivh clickhouse-common-static-24.1.8.22.x86_64.rpm
  3. # 安装服务器,遇到 Enter password for default user 提示时输入密码
  4. rpm -ivh clickhouse-server-24.1.8.22.x86_64.rpm
  5. # 安装命令行客户端
  6. rpm -ivh clickhouse-client-24.1.8.22.x86_64.rpm

3. 启动

sudo service clickhouse-server start

        查看启动后进程:

  1. [root@vvml-yz-hbase-test~]#ps -ef | grep clickhouse | grep -v grep
  2. clickho+  5322     1  0 08:49 ?        00:00:00 clickhouse-watchdog        --config-file /etc/clickhouse-server/config.xml --pid-file /var/run/clickhouse-server/clickhouse-server.pid --daemon
  3. clickho+  5323  5322  6 08:49 ?        00:00:00 /usr/bin/clickhouse-server --config-file /etc/clickhouse-server/config.xml --pid-file /var/run/clickhouse-server/clickhouse-server.pid --daemon
  4. [root@vvml-yz-hbase-test~]#

        查看监听端口

  1. [root@vvml-yz-hbase-test~]#netstat -antpl | grep clickhouse
  2. tcp        0      0 127.0.0.1:9004          0.0.0.0:*               LISTEN      5323/clickhouse-ser 
  3. tcp        0      0 127.0.0.1:9005          0.0.0.0:*               LISTEN      5323/clickhouse-ser 
  4. tcp        0      0 127.0.0.1:9009          0.0.0.0:*               LISTEN      5323/clickhouse-ser 
  5. tcp        0      0 127.0.0.1:8123          0.0.0.0:*               LISTEN      5323/clickhouse-ser 
  6. tcp        0      0 127.0.0.1:9000          0.0.0.0:*               LISTEN      5323/clickhouse-ser 
  7. tcp6       0      0 ::1:9004                :::*                    LISTEN      5323/clickhouse-ser 
  8. tcp6       0      0 ::1:9005                :::*                    LISTEN      5323/clickhouse-ser 
  9. tcp6       0      0 ::1:9009                :::*                    LISTEN      5323/clickhouse-ser 
  10. tcp6       0      0 ::1:8123                :::*                    LISTEN      5323/clickhouse-ser 
  11. tcp6       0      0 ::1:9000                :::*                    LISTEN      5323/clickhouse-ser 
  12. [root@vvml-yz-hbase-test~]#

        服务端日志的默认位置是 /var/log/clickhouse-server/。当服务端在日志中记录 Ready for connections 消息,即表示服务端已准备好处理客户端连接。一旦 clickhouse-server 启动并运行,可以利用 clickhouse-client 连接到服务端,并运行一些测试查询。

4. 验证

  1. [root@vvml-yz-hbase-test~]#clickhouse-client --password="123456" --query "select version();"
  2. 24.1.8.22
  3. [root@vvml-yz-hbase-test~]#clickhouse-client --password="123456" -n --query "select 1;select 2;"
  4. 1
  5. 2
  6. [root@vvml-yz-hbase-test~]#

二、导入示例数据集

1. 下载并提取表数据

  1. curl https://datasets.clickhouse.com/hits/tsv/hits_v1.tsv.xz | unxz --threads=`nproc` > hits_v1.tsv
  2. curl https://datasets.clickhouse.com/visits/tsv/visits_v1.tsv.xz | unxz --threads=`nproc` > visits_v1.tsv

2. 创建库表

        创建 sample.sql 文件,内容如下(注意嵌套表定义):

  1. create database if not exists tutorial;
  2. CREATE TABLE tutorial.hits_v1
  3. (
  4.     `WatchID` UInt64,
  5.     `JavaEnable` UInt8,
  6.     `Title` String,
  7.     `GoodEvent` Int16,
  8.     `EventTime` DateTime,
  9.     `EventDate` Date,
  10.     `CounterID` UInt32,
  11.     `ClientIP` UInt32,
  12.     `ClientIP6` FixedString(16),
  13.     `RegionID` UInt32,
  14.     `UserID` UInt64,
  15.     `CounterClass` Int8,
  16.     `OS` UInt8,
  17.     `UserAgent` UInt8,
  18.     `URL` String,
  19.     `Referer` String,
  20.     `URLDomain` String,
  21.     `RefererDomain` String,
  22.     `Refresh` UInt8,
  23.     `IsRobot` UInt8,
  24.     `RefererCategories` Array(UInt16),
  25.     `URLCategories` Array(UInt16),
  26.     `URLRegions` Array(UInt32),
  27.     `RefererRegions` Array(UInt32),
  28.     `ResolutionWidth` UInt16,
  29.     `ResolutionHeight` UInt16,
  30.     `ResolutionDepth` UInt8,
  31.     `FlashMajor` UInt8,
  32.     `FlashMinor` UInt8,
  33.     `FlashMinor2` String,
  34.     `NetMajor` UInt8,
  35.     `NetMinor` UInt8,
  36.     `UserAgentMajor` UInt16,
  37.     `UserAgentMinor` FixedString(2),
  38.     `CookieEnable` UInt8,
  39.     `JavascriptEnable` UInt8,
  40.     `IsMobile` UInt8,
  41.     `MobilePhone` UInt8,
  42.     `MobilePhoneModel` String,
  43.     `Params` String,
  44.     `IPNetworkID` UInt32,
  45.     `TraficSourceID` Int8,
  46.     `SearchEngineID` UInt16,
  47.     `SearchPhrase` String,
  48.     `AdvEngineID` UInt8,
  49.     `IsArtifical` UInt8,
  50.     `WindowClientWidth` UInt16,
  51.     `WindowClientHeight` UInt16,
  52.     `ClientTimeZone` Int16,
  53.     `ClientEventTime` DateTime,
  54.     `SilverlightVersion1` UInt8,
  55.     `SilverlightVersion2` UInt8,
  56.     `SilverlightVersion3` UInt32,
  57.     `SilverlightVersion4` UInt16,
  58.     `PageCharset` String,
  59.     `CodeVersion` UInt32,
  60.     `IsLink` UInt8,
  61.     `IsDownload` UInt8,
  62.     `IsNotBounce` UInt8,
  63.     `FUniqID` UInt64,
  64.     `HID` UInt32,
  65.     `IsOldCounter` UInt8,
  66.     `IsEvent` UInt8,
  67.     `IsParameter` UInt8,
  68.     `DontCountHits` UInt8,
  69.     `WithHash` UInt8,
  70.     `HitColor` FixedString(1),
  71.     `UTCEventTime` DateTime,
  72.     `Age` UInt8,
  73.     `Sex` UInt8,
  74.     `Income` UInt8,
  75.     `Interests` UInt16,
  76.     `Robotness` UInt8,
  77.     `GeneralInterests` Array(UInt16),
  78.     `RemoteIP` UInt32,
  79.     `RemoteIP6` FixedString(16),
  80.     `WindowName` Int32,
  81.     `OpenerName` Int32,
  82.     `HistoryLength` Int16,
  83.     `BrowserLanguage` FixedString(2),
  84.     `BrowserCountry` FixedString(2),
  85.     `SocialNetwork` String,
  86.     `SocialAction` String,
  87.     `HTTPError` UInt16,
  88.     `SendTiming` Int32,
  89.     `DNSTiming` Int32,
  90.     `ConnectTiming` Int32,
  91.     `ResponseStartTiming` Int32,
  92.     `ResponseEndTiming` Int32,
  93.     `FetchTiming` Int32,
  94.     `RedirectTiming` Int32,
  95.     `DOMInteractiveTiming` Int32,
  96.     `DOMContentLoadedTiming` Int32,
  97.     `DOMCompleteTiming` Int32,
  98.     `LoadEventStartTiming` Int32,
  99.     `LoadEventEndTiming` Int32,
  100.     `NSToDOMContentLoadedTiming` Int32,
  101.     `FirstPaintTiming` Int32,
  102.     `RedirectCount` Int8,
  103.     `SocialSourceNetworkID` UInt8,
  104.     `SocialSourcePage` String,
  105.     `ParamPrice` Int64,
  106.     `ParamOrderID` String,
  107.     `ParamCurrency` FixedString(3),
  108.     `ParamCurrencyID` UInt16,
  109.     `GoalsReached` Array(UInt32),
  110.     `OpenstatServiceName` String,
  111.     `OpenstatCampaignID` String,
  112.     `OpenstatAdID` String,
  113.     `OpenstatSourceID` String,
  114.     `UTMSource` String,
  115.     `UTMMedium` String,
  116.     `UTMCampaign` String,
  117.     `UTMContent` String,
  118.     `UTMTerm` String,
  119.     `FromTag` String,
  120.     `HasGCLID` UInt8,
  121.     `RefererHash` UInt64,
  122.     `URLHash` UInt64,
  123.     `CLID` UInt32,
  124.     `YCLID` UInt64,
  125.     `ShareService` String,
  126.     `ShareURL` String,
  127.     `ShareTitle` String,
  128.     `ParsedParams` Nested(
  129.         Key1 String,
  130.         Key2 String,
  131.         Key3 String,
  132.         Key4 String,
  133.         Key5 String,
  134.         ValueDouble Float64),
  135.     `IslandID` FixedString(16),
  136.     `RequestNum` UInt32,
  137.     `RequestTry` UInt8
  138. )
  139. ENGINE = MergeTree()
  140. PARTITION BY toYYYYMM(EventDate)
  141. ORDER BY (CounterID, EventDate, intHash32(UserID))
  142. SAMPLE BY intHash32(UserID);
  143. CREATE TABLE tutorial.visits_v1
  144. (
  145.     `CounterID` UInt32,
  146.     `StartDate` Date,
  147.     `Sign` Int8,
  148.     `IsNew` UInt8,
  149.     `VisitID` UInt64,
  150.     `UserID` UInt64,
  151.     `StartTime` DateTime,
  152.     `Duration` UInt32,
  153.     `UTCStartTime` DateTime,
  154.     `PageViews` Int32,
  155.     `Hits` Int32,
  156.     `IsBounce` UInt8,
  157.     `Referer` String,
  158.     `StartURL` String,
  159.     `RefererDomain` String,
  160.     `StartURLDomain` String,
  161.     `EndURL` String,
  162.     `LinkURL` String,
  163.     `IsDownload` UInt8,
  164.     `TraficSourceID` Int8,
  165.     `SearchEngineID` UInt16,
  166.     `SearchPhrase` String,
  167.     `AdvEngineID` UInt8,
  168.     `PlaceID` Int32,
  169.     `RefererCategories` Array(UInt16),
  170.     `URLCategories` Array(UInt16),
  171.     `URLRegions` Array(UInt32),
  172.     `RefererRegions` Array(UInt32),
  173.     `IsYandex` UInt8,
  174.     `GoalReachesDepth` Int32,
  175.     `GoalReachesURL` Int32,
  176.     `GoalReachesAny` Int32,
  177.     `SocialSourceNetworkID` UInt8,
  178.     `SocialSourcePage` String,
  179.     `MobilePhoneModel` String,
  180.     `ClientEventTime` DateTime,
  181.     `RegionID` UInt32,
  182.     `ClientIP` UInt32,
  183.     `ClientIP6` FixedString(16),
  184.     `RemoteIP` UInt32,
  185.     `RemoteIP6` FixedString(16),
  186.     `IPNetworkID` UInt32,
  187.     `SilverlightVersion3` UInt32,
  188.     `CodeVersion` UInt32,
  189.     `ResolutionWidth` UInt16,
  190.     `ResolutionHeight` UInt16,
  191.     `UserAgentMajor` UInt16,
  192.     `UserAgentMinor` UInt16,
  193.     `WindowClientWidth` UInt16,
  194.     `WindowClientHeight` UInt16,
  195.     `SilverlightVersion2` UInt8,
  196.     `SilverlightVersion4` UInt16,
  197.     `FlashVersion3` UInt16,
  198.     `FlashVersion4` UInt16,
  199.     `ClientTimeZone` Int16,
  200.     `OS` UInt8,
  201.     `UserAgent` UInt8,
  202.     `ResolutionDepth` UInt8,
  203.     `FlashMajor` UInt8,
  204.     `FlashMinor` UInt8,
  205.     `NetMajor` UInt8,
  206.     `NetMinor` UInt8,
  207.     `MobilePhone` UInt8,
  208.     `SilverlightVersion1` UInt8,
  209.     `Age` UInt8,
  210.     `Sex` UInt8,
  211.     `Income` UInt8,
  212.     `JavaEnable` UInt8,
  213.     `CookieEnable` UInt8,
  214.     `JavascriptEnable` UInt8,
  215.     `IsMobile` UInt8,
  216.     `BrowserLanguage` UInt16,
  217.     `BrowserCountry` UInt16,
  218.     `Interests` UInt16,
  219.     `Robotness` UInt8,
  220.     `GeneralInterests` Array(UInt16),
  221.     `Params` Array(String),
  222.     `Goals` Nested(
  223.         ID UInt32,
  224.         Serial UInt32,
  225.         EventTime DateTime,
  226.         Price Int64,
  227.         OrderID String,
  228.         CurrencyID UInt32),
  229.     `WatchIDs` Array(UInt64),
  230.     `ParamSumPrice` Int64,
  231.     `ParamCurrency` FixedString(3),
  232.     `ParamCurrencyID` UInt16,
  233.     `ClickLogID` UInt64,
  234.     `ClickEventID` Int32,
  235.     `ClickGoodEvent` Int32,
  236.     `ClickEventTime` DateTime,
  237.     `ClickPriorityID` Int32,
  238.     `ClickPhraseID` Int32,
  239.     `ClickPageID` Int32,
  240.     `ClickPlaceID` Int32,
  241.     `ClickTypeID` Int32,
  242.     `ClickResourceID` Int32,
  243.     `ClickCost` UInt32,
  244.     `ClickClientIP` UInt32,
  245.     `ClickDomainID` UInt32,
  246.     `ClickURL` String,
  247.     `ClickAttempt` UInt8,
  248.     `ClickOrderID` UInt32,
  249.     `ClickBannerID` UInt32,
  250.     `ClickMarketCategoryID` UInt32,
  251.     `ClickMarketPP` UInt32,
  252.     `ClickMarketCategoryName` String,
  253.     `ClickMarketPPName` String,
  254.     `ClickAWAPSCampaignName` String,
  255.     `ClickPageName` String,
  256.     `ClickTargetType` UInt16,
  257.     `ClickTargetPhraseID` UInt64,
  258.     `ClickContextType` UInt8,
  259.     `ClickSelectType` Int8,
  260.     `ClickOptions` String,
  261.     `ClickGroupBannerID` Int32,
  262.     `OpenstatServiceName` String,
  263.     `OpenstatCampaignID` String,
  264.     `OpenstatAdID` String,
  265.     `OpenstatSourceID` String,
  266.     `UTMSource` String,
  267.     `UTMMedium` String,
  268.     `UTMCampaign` String,
  269.     `UTMContent` String,
  270.     `UTMTerm` String,
  271.     `FromTag` String,
  272.     `HasGCLID` UInt8,
  273.     `FirstVisit` DateTime,
  274.     `PredLastVisit` Date,
  275.     `LastVisit` Date,
  276.     `TotalVisits` UInt32,
  277.     `TraficSource` Nested(
  278.         ID Int8,
  279.         SearchEngineID UInt16,
  280.         AdvEngineID UInt8,
  281.         PlaceID UInt16,
  282.         SocialSourceNetworkID UInt8,
  283.         Domain String,
  284.         SearchPhrase String,
  285.         SocialSourcePage String),
  286.     `Attendance` FixedString(16),
  287.     `CLID` UInt32,
  288.     `YCLID` UInt64,
  289.     `NormalizedRefererHash` UInt64,
  290.     `SearchPhraseHash` UInt64,
  291.     `RefererDomainHash` UInt64,
  292.     `NormalizedStartURLHash` UInt64,
  293.     `StartURLDomainHash` UInt64,
  294.     `NormalizedEndURLHash` UInt64,
  295.     `TopLevelDomain` UInt64,
  296.     `URLScheme` UInt64,
  297.     `OpenstatServiceNameHash` UInt64,
  298.     `OpenstatCampaignIDHash` UInt64,
  299.     `OpenstatAdIDHash` UInt64,
  300.     `OpenstatSourceIDHash` UInt64,
  301.     `UTMSourceHash` UInt64,
  302.     `UTMMediumHash` UInt64,
  303.     `UTMCampaignHash` UInt64,
  304.     `UTMContentHash` UInt64,
  305.     `UTMTermHash` UInt64,
  306.     `FromHash` UInt64,
  307.     `WebVisorEnabled` UInt8,
  308.     `WebVisorActivity` UInt32,
  309.     `ParsedParams` Nested(
  310.         Key1 String,
  311.         Key2 String,
  312.         Key3 String,
  313.         Key4 String,
  314.         Key5 String,
  315.         ValueDouble Float64),
  316.     `Market` Nested(
  317.         Type UInt8,
  318.         GoalID UInt32,
  319.         OrderID String,
  320.         OrderPrice Int64,
  321.         PP UInt32,
  322.         DirectPlaceID UInt32,
  323.         DirectOrderID UInt32,
  324.         DirectBannerID UInt32,
  325.         GoodID String,
  326.         GoodName String,
  327.         GoodQuantity Int32,
  328.         GoodPrice Int64),
  329.     `IslandID` FixedString(16)
  330. )
  331. ENGINE = CollapsingMergeTree(Sign)
  332. PARTITION BY toYYYYMM(StartDate)
  333. ORDER BY (CounterID, StartDate, intHash32(UserID), VisitID)
  334. SAMPLE BY intHash32(UserID);

        执行 sample.sql 文件:

clickhouse-client --password="123456" --queries-file sample.sql

3. 导入数据

  1. clickhouse-client --password="123456" --query "INSERT INTO tutorial.hits_v1 FORMAT TSV" --max_insert_block_size=100000 < hits_v1.tsv
  2. clickhouse-client --password="123456" --query "INSERT INTO tutorial.visits_v1 FORMAT TSV" --max_insert_block_size=100000 < visits_v1.tsv

        找出可用的设置、含义及其默认值的最简单方法是查询 system.settings 表:

  1. vvml-yz-hbase-test.172.18.4.126 :) select name, value, changed, description
  2.   from system.settings
  3.  where name like '%max_insert_b%';
  4. SELECT
  5.     name,
  6.     value,
  7.     changed,
  8.     description
  9. FROM system.settings
  10. WHERE name LIKE '%max_insert_b%'
  11. Query id: 05bc6241-2d1f-432e-87b3-f35a3ad612c8
  12. ┌─name──────────────────┬─value───┬─changed─┬─description───────────────────────────────────────────────────────────────────────────────┐
  13. │ max_insert_block_size │ 1048449 │       0 │ The maximum block size for insertion, if we control the creation of blocks for insertion. │
  14. └───────────────────────┴─────────┴─────────┴───────────────────────────────────────────────────────────────────────────────────────────┘
  15. 1 row in set. Elapsed: 0.002 sec. 
  16. vvml-yz-hbase-test.172.18.4.126 :) 

4. 优化表

        可以 OPTIMIZE 导入后的表。使用 MergeTree-family 引擎配置的表总是在后台合并数据部分以优化数据存储(或至少检查是否有意义)。这些查询强制表引擎立即进行存储优化(较慢,谨慎手工执行):

  1. clickhouse-client --password="123456" --query "OPTIMIZE TABLE tutorial.hits_v1 FINAL"
  2. clickhouse-client --password="123456" --query "OPTIMIZE TABLE tutorial.visits_v1 FINAL"

5. 查询示例

  1. vvml-yz-hbase-test.172.18.4.126 :) SELECT
  2.     StartURL AS URL,
  3.     AVG(Duration) AS AvgDuration
  4. FROM tutorial.visits_v1
  5. WHERE StartDate BETWEEN '2014-03-23' AND '2014-03-30'
  6. GROUP BY URL
  7. ORDER BY AvgDuration DESC
  8. LIMIT 10;
  9. SELECT
  10.     StartURL AS URL,
  11.     AVG(Duration) AS AvgDuration
  12. FROM tutorial.visits_v1
  13. WHERE (StartDate >= '2014-03-23') AND (StartDate <= '2014-03-30')
  14. GROUP BY URL
  15. ORDER BY AvgDuration DESC
  16. LIMIT 10
  17. Query id: fc2e8638-5081-496a-964b-a679eab63af9
  18. ┌─URL─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─AvgDuration─┐
  19. │ http://itpalanija-pri-patrivative=0&ads_app_user                                                                                                                                                │       60127
  20. │ http://renaul-myd-ukraine                                                                                                                                                                       │       58938
  21. │ http://karta/Futbol/dynamo.kiev.ua/kawaica.su/648                                                                                                                                               │       56538
  22. │ https://moda/vyikroforum1/top.ru/moscow/delo-product/trend_sms/multitryaset/news/2014/03/201000                                                                                                 │       55218
  23. │ http://e.mail=on&default?abid=2061&scd=yes&option?r=city_inter.com/menu&site-zaferio.ru/c/m.ensor.net/ru/login=false&orderStage.php?Brandidatamalystyle/20Mar2014%2F007%2F94dc8d2e06e56ed56bbdd │       51378
  24. │ http://karta/Futbol/dynas.com/haberler.ru/messages.yandsearchives/494503_lte_13800200319                                                                                                        │       49078
  25. │ http://xmusic/vstreatings of speeds                                                                                                                                                             │       36925
  26. │ http://news.ru/yandex.ru/api.php&api=http://toberria.ru/aphorizana                                                                                                                              │       36902
  27. │ http://bashmelnykh-metode.net/video/#!/video/emberkas.ru/detskij-yazi.com/iframe/default.aspx?id=760928&noreask=1&source                                                                        │       34323
  28. │ http://censonhaber/547-popalientLog=0&strizhki-petro%3D&comeback=search?lr=213&text                                                                                                             │       31773
  29. └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────────┘
  30. 10 rows in set. Elapsed: 0.080 sec. Processed 1.43 million rows, 112.12 MB (17.91 million rows/s., 1.40 GB/s.)
  31. Peak memory usage: 45.07 MiB.
  32. vvml-yz-hbase-test.172.18.4.126 :) SELECT
  33.     sum(Sign) AS visits,
  34.     sumIf(Sign, has(Goals.ID, 1105530)) AS goal_visits,
  35.     (100. * goal_visits) / visits AS goal_percent
  36. FROM tutorial.visits_v1
  37. WHERE (CounterID = 912887) AND (toYYYYMM(StartDate) = 201403) AND (domain(StartURL) = 'yandex.ru');
  38. SELECT
  39.     sum(Sign) AS visits,
  40.     sumIf(Sign, has(Goals.ID, 1105530)) AS goal_visits,
  41.     (100. * goal_visits) / visits AS goal_percent
  42. FROM tutorial.visits_v1
  43. WHERE (CounterID = 912887) AND (toYYYYMM(StartDate) = 201403) AND (domain(StartURL) = 'yandex.ru')
  44. Query id: 02df86cc-ef97-4cec-9892-6ba92dc87d2c
  45. ┌─visits─┬─goal_visits─┬──────goal_percent─┐
  46. │  10543 │        855381.12491700654462
  47. └────────┴─────────────┴───────────────────┘
  48. 1 row in set. Elapsed: 0.014 sec. Processed 19.72 thousand rows, 3.45 MB (1.46 million rows/s., 255.24 MB/s.)
  49. Peak memory usage: 5.01 MiB.
  50. vvml-yz-hbase-test.172.18.4.126 :) 

三、集群部署

        本次部署使用四台主机,构建两个分片,每个分片两个副本的 ClickHouse 集群,IP 和主机名如下:

  1. 172.18.4.126    node1
  2. 172.18.4.188    node2
  3. 172.18.4.71    node3
  4. 172.18.4.86    node4

        注意,ClickHouse 要求每个分片的每个副本必须配置在单独的实例上,也就是说在整个集群范围内,一共有多少个副本,就需要创建多少个 ClickHouse 实例。最佳实践是3分片2副本6实例。

        ClickHouse 推荐使用 ClickHouse Keeper 替代 Zookeeper(https://clickhouse.com/docs/knowledgebase/why_recommend_clickhouse_keeper_over_zookeeper)。
        下面使用 ClickHouse Keeper 配置 ClickHouse 集群。具体操作步骤参考 ClickHouse Keeper 用户指南(https://clickhouse.com/docs/en/guides/sre/keeper/clickhouse-keeper#clickhouse-keeper-user-guide)。

        ClickHouse 服务器中捆绑了 clickhouse-keeper。如果已经安装服务器,则无法单独安装 clickhouse-keeper,会收到冲突错误:

  1. [root@vvml-yz-hbase-test~]#rpm -ivh clickhouse-keeper-24.1.8.22.x86_64.rpm 
  2. error: Failed dependencies:
  3.     clickhouse-server conflicts with clickhouse-keeper-0:24.1.8.22-1.x86_64
  4. [root@vvml-yz-hbase-test~]#

        但如果有仅用作 clickhouse-keeper 的服务器,则仅可以单独安装 clickhouse-keeper。

  1. # 查看版本
  2. https://github.com/ClickHouse/ClickHouse/releases
  3. # 最新稳定版本安装包下载地址
  4. https://github.com/ClickHouse/ClickHouse/releases/download/v24.1.8.22-stable/clickhouse-keeper-24.1.8.22.x86_64.rpm

        本次部署在每个主机安装 clickhouse-server,而不单独安装 clickhouse-keeper。分片、副本、keeper规划如下:

  1. node1    分片1副本1    keeper
  2. node2    分片1副本2    keeper
  3. node3    分片2副本1    keeper
  4. node4    分片2副本2

        四台主机都作为数据分片,每个数据分片有两个副本。keeper 部署到三个实例上,奇数实例用于实现 ClickHouse Keeper 中要求的票选数。

0. 安装前准备

        在配置 ClickHouse 集群前,需要在全部四台机器上完成以下准备工作:

  • 启动 NTP 时钟同步
  • /etc/hosts 文件中添加构成集群的所有主机名
  • 配置所有主机间 ssh 免密
  • 修改用户可打开文件数与进程数
  • 禁用防火墙
  • 禁用 transparent hugepage

1. 安装配置 ClickHouse Keeper

        在 node1、node2、node3 三台主机上执行下面的操作步骤。

(1)安装 ClickHouse Server 和 ClickHouse Client

  1. rpm -ivh clickhouse-common-static-24.1.8.22.x86_64.rpm
  2. rpm -ivh clickhouse-server-24.1.8.22.x86_64.rpm
  3. rpm -ivh clickhouse-client-24.1.8.22.x86_64.rpm

(2)修改 ClickHouse 主配置文件 

  1. # 修改前先备份
  2. cp /etc/clickhouse-server/config.xml /etc/clickhouse-server/config.xml.bak

        修改 /etc/clickhouse-server/config.xml 主配置文件,在根节点 <clickhouse> 下添加以下内容:

  1. <!-- 添加以下条目以允许通过网络接口进行外部通信。 -->
  2. <listen_host>0.0.0.0</listen_host>
  3. <!-- 指定实例启用 ClickHouse Keeper。更新每台服务器的<server_id>设置,node1为1、node2为2、node3为3。-->
  4. <keeper_server>
  5.     <tcp_port>9181</tcp_port>
  6.     <server_id>1</server_id>
  7.     <log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
  8.     <snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>
  9.     <coordination_settings>
  10.         <operation_timeout_ms>10000</operation_timeout_ms>
  11.         <session_timeout_ms>30000</session_timeout_ms>
  12.         <raft_logs_level>warning</raft_logs_level>
  13.     </coordination_settings>
  14.     <raft_configuration>
  15.         <server>
  16.             <id>1</id>
  17.             <hostname>node1</hostname>
  18.             <port>9234</port>
  19.         </server>
  20.         <server>
  21.             <id>2</id>
  22.             <hostname>node2</hostname>
  23.             <port>9234</port>
  24.         </server>
  25.         <server>
  26.             <id>3</id>
  27.             <hostname>node3</hostname>
  28.             <port>9234</port>
  29.         </server>
  30.     </raft_configuration>
  31. </keeper_server>
  32. <!-- 指定实例所使用的 ClickHouse Keeper -->
  33. <zookeeper>
  34.     <node>
  35.         <host>node1</host>
  36.         <port>9181</port>
  37.     </node>
  38.     <node>
  39.         <host>node2</host>
  40.         <port>9181</port>
  41.     </node>
  42.     <node>
  43.         <host>node3</host>
  44.         <port>9181</port>
  45.     </node>
  46. </zookeeper>

(3)重启 ClickHouse

sudo service clickhouse-server restart

(4)验证 Keeper 实例是否正在运行

        在 node1、node2、node3 上执行下面的命令,如果 Keeper 运行正常,ruok 命令将返回 imok:

  1. [root@vvml-yz-hbase-test~]#echo ruok | nc localhost 9181; echo
  2. imok
  3. [root@vvml-yz-hbase-test~]#

(5)确认 zookeeper 系统表

        系统数据库有一个名为 zookeeper 的表,其中包含 ClickHouse Keeper 实例的详细信息:

  1. [root@vvml-yz-hbase-test~]#clickhouse-client --password="123456"
  2. ClickHouse client version 24.1.8.22 (official build).
  3. Connecting to localhost:9000 as user default.
  4. Connected to ClickHouse server version 24.1.8.
  5. vvml-yz-hbase-test.172.18.4.126 :) SELECT *
  6. FROM system.zookeeper
  7. WHERE path IN ('/', '/clickhouse');
  8. SELECT *
  9. FROM system.zookeeper
  10. WHERE path IN ('/', '/clickhouse')
  11. Query id: e713c446-26c5-4c3f-994c-db22eb68b9ad
  12. ┌─name───────┬─value─┬─path────────┐
  13. │ keeper     │       │ /           │
  14. │ clickhouse │       │ /           │
  15. │ task_queue │       │ /clickhouse │
  16. │ sessions   │       │ /clickhouse │
  17. └────────────┴───────┴─────────────┘
  18. 4 rows in set. Elapsed: 0.002 sec. 
  19. vvml-yz-hbase-test.172.18.4.126 :) 

2. 配置 ClickHouse 集群

(1)在新主机(node4)上安装 ClickHouse Server 和 ClickHouse Client

  1. rpm -ivh clickhouse-common-static-24.1.8.22.x86_64.rpm
  2. rpm -ivh clickhouse-server-24.1.8.22.x86_64.rpm
  3. rpm -ivh clickhouse-client-24.1.8.22.x86_64.rpm

(2)修改新主机(node4)上的配置文件

  1. # 修改前先备份
  2. cp /etc/clickhouse-server/config.xml /etc/clickhouse-server/config.xml.bak

        修改 /etc/clickhouse-server/config.xml 主配置文件,在根节点 <clickhouse> 下添加以下内容:

  1. <!-- 添加以下条目以允许通过网络接口进行外部通信。 -->
  2. <listen_host>0.0.0.0</listen_host>
  3. <!-- 指定实例所使用的 ClickHouse Keeper -->
  4. <zookeeper>
  5.     <node>
  6.         <host>node1</host>
  7.         <port>9181</port>
  8.     </node>
  9.     <node>
  10.         <host>node2</host>
  11.         <port>9181</port>
  12.     </node>
  13.     <node>
  14.         <host>node3</host>
  15.         <port>9181</port>
  16.     </node>
  17. </zookeeper>

        然后在全部四台主机上执行下面的操作步骤。

(3)更新配置

        修改 /etc/clickhouse-server/config.xml 主配置文件,在根节点 <clickhouse> 下添加以下内容:

  1.     <remote_servers>
  2.         <cluster_2S_2R>
  3.             <shard>
  4.                 <replica>
  5.                     <host>node1</host>
  6.                     <port>9000</port>
  7.                     <user>default</user>
  8.                     <password>123456</password>
  9.                 </replica>
  10.                 <replica>
  11.                     <host>node2</host>
  12.                     <port>9000</port>
  13.                     <user>default</user>
  14.                     <password>123456</password>
  15.                 </replica>
  16.             </shard>
  17.             <shard>
  18.                 <replica>
  19.                     <host>node3</host>
  20.                     <port>9000</port>
  21.                     <user>default</user>
  22.                     <password>123456</password>
  23.                 </replica>
  24.                 <replica>
  25.                     <host>node4</host>
  26.                     <port>9000</port>
  27.                     <user>default</user>
  28.                     <password>123456</password>
  29.                 </replica>
  30.             </shard>
  31.         </cluster_2S_2R>
  32.     </remote_servers>

(4)重启 ClickHouse 并验证集群已创建

sudo service clickhouse-server restart

        查看集群:

  1. [root@vvml-yz-hbase-test~]#clickhouse-client --password="123456"
  2. ClickHouse client version 24.1.8.22 (official build).
  3. Connecting to localhost:9000 as user default.
  4. Connected to ClickHouse server version 24.1.8.
  5. vvml-yz-hbase-test.172.18.4.126 :) SHOW CLUSTERS;
  6. SHOW CLUSTERS
  7. Query id: 80e81978-d1f4-4721-85d8-7e7803230373
  8. ┌─cluster───────┐
  9. │ cluster_2S_2R │
  10. └───────────────┘
  11. 1 row in set. Elapsed: 0.001 sec. 
  12. vvml-yz-hbase-test.172.18.4.126 :) select cluster,shard_num,replica_num,host_name,is_local,user,database_shard_name,database_replica_name from system.clusters;
  13. SELECT
  14.     cluster,
  15.     shard_num,
  16.     replica_num,
  17.     host_name,
  18.     is_local,
  19.     user,
  20.     database_shard_name,
  21.     database_replica_name
  22. FROM system.clusters
  23. Query id: 4b39d9ec-b4f7-4557-b76d-05f3893f4ef7
  24. ┌─cluster───────┬─shard_num─┬─replica_num─┬─host_name─┬─is_local─┬─user────┬─database_shard_name─┬─database_replica_name─┐
  25. │ cluster_2S_2R │         1 │           1 │ node1     │        1default │                     │                       │
  26. │ cluster_2S_2R │         1 │           2 │ node2     │        0default │                     │                       │
  27. │ cluster_2S_2R │         2 │           1 │ node3     │        0default │                     │                       │
  28. │ cluster_2S_2R │         2 │           2 │ node4     │        0default │                     │                       │
  29. └───────────────┴───────────┴─────────────┴───────────┴──────────┴─────────┴─────────────────────┴───────────────────────┘
  30. 4 rows in set. Elapsed: 0.001 sec. 
  31. vvml-yz-hbase-test.172.18.4.126 :) 

3. 创建分布式表

(1)创建数据库

        使用 node1 上创建一个数据库。ON CLUSTER子句会自动在所有实例上创建数据库。

  1. vvml-yz-hbase-test.172.18.4.126 :) CREATE DATABASE db1 ON CLUSTER 'cluster_2S_2R';
  2. CREATE DATABASE db1 ON CLUSTER cluster_2S_2R
  3. Query id: 7a8cd789-bcfb-4855-a131-ba7935cffcfb
  4. ┌─host──┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
  5. │ node1 │ 9000 │      0 │       │                   3 │                0
  6. │ node3 │ 9000 │      0 │       │                   2 │                0
  7. │ node4 │ 9000 │      0 │       │                   1 │                0
  8. │ node2 │ 9000 │      0 │       │                   0 │                0
  9. └───────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
  10. 4 rows in set. Elapsed: 0.099 sec. 
  11. vvml-yz-hbase-test.172.18.4.126 :)

(2)创建本地表

        在 db1 库中建表,同样,ON CLUSTER 子句会自动在所有实例上建表。

  1. vvml-yz-hbase-test.172.18.4.126 :) CREATE TABLE db1.table1 on cluster 'cluster_2S_2R'
  2. (
  3.     `id` UInt64,
  4.     `column1` String
  5. )
  6. ENGINE = MergeTree
  7. ORDER BY column1;
  8. CREATE TABLE db1.table1 ON CLUSTER cluster_2S_2R
  9. (
  10.     `id` UInt64,
  11.     `column1` String
  12. )
  13. ENGINE = MergeTree
  14. ORDER BY column1
  15. Query id: abb936ad-3618-4821-92f6-cfaa83fb4d51
  16. ┌─host──┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
  17. │ node3 │ 9000 │      0 │       │                   3 │                2
  18. │ node1 │ 9000 │      0 │       │                   2 │                2
  19. └───────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
  20. ┌─host──┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
  21. │ node4 │ 9000 │      0 │       │                   1 │                0
  22. │ node2 │ 9000 │      0 │       │                   0 │                0
  23. └───────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
  24. 4 rows in set. Elapsed: 0.158 sec. 
  25. vvml-yz-hbase-test.172.18.4.126 :)

(3)在一个分片实例上(node1)新增两行

  1. vvml-yz-hbase-test.172.18.4.126 :) INSERT INTO db1.table1
  2.     (id, column1)
  3. VALUES
  4.     (1, 'abc'),
  5.     (2, 'def');
  6. INSERT INTO db1.table1 (id, column1) FORMAT Values
  7. Query id: 959da99c-c473-4c3a-9381-fa65b447161c
  8. Ok.
  9. 2 rows in set. Elapsed: 0.002 sec. 
  10. vvml-yz-hbase-test.172.18.4.126 :) 

(4)在另一个分片实例上(node3)新增两行

  1. vvml-yz-hbase-test.172.18.4.71 :) INSERT INTO db1.table1
  2.     (id, column1)
  3. VALUES
  4.     (3, 'ghi'),
  5.     (4, 'jkl');
  6. INSERT INTO db1.table1 (id, column1) FORMAT Values
  7. Query id: c8864197-ec82-4aba-8c2e-aaa2af468553
  8. Ok.
  9. 2 rows in set. Elapsed: 0.002 sec. 
  10. vvml-yz-hbase-test.172.18.4.71 :)

(5)在所有实例上分别执行 SELECT 查询

        node1、node3 上查询结果显示该本地实例上的两行数据:

  1. # node1
  2. vvml-yz-hbase-test.172.18.4.126 :) SELECT *
  3. FROM db1.table1;
  4. SELECT *
  5. FROM db1.table1
  6. Query id: 2f2fc679-9091-41ae-967b-4bd8e2ec7311
  7. ┌─id─┬─column1─┐
  8. │  1 │ abc     │
  9. │  2 │ def     │
  10. └────┴─────────┘
  11. 2 rows in set. Elapsed: 0.001 sec. 
  12. vvml-yz-hbase-test.172.18.4.126 :) 
  13. # node3
  14. vvml-yz-hbase-test.172.18.4.71 :) SELECT *
  15. FROM db1.table1;
  16. SELECT *
  17. FROM db1.table1
  18. Query id: 18843522-c678-45f0-901d-73e1bbfd4dbf
  19. ┌─id─┬─column1─┐
  20. │  3 │ ghi     │
  21. │  4 │ jkl     │
  22. └────┴─────────┘
  23. 2 rows in set. Elapsed: 0.002 sec. 
  24. vvml-yz-hbase-test.172.18.4.71 :) 

        node2、node4 上查不到数据,说明写入本地表并没有复制到其副本。

(6)创建分布式表

        可以创建一个分布式表来表示两个分片上的数据。具有分布式表引擎的表不存储自己的任何数据,而是允许在多个服务器上进行分布式查询处理。读取命中所有分片,写入可以分布在分片之间。在任一实例上创建分布式表:

  1. vvml-yz-hbase-test.172.18.4.126 :) CREATE TABLE db1.dist_table ON CLUSTER 'cluster_2S_2R'
  2. (
  3.     id UInt64,
  4.     column1 String
  5. )
  6. ENGINE = Distributed(cluster_2S_2R,db1,table1);
  7. CREATE TABLE db1.dist_table ON CLUSTER cluster_2S_2R
  8. (
  9.     `id` UInt64,
  10.     `column1` String
  11. )
  12. ENGINE = Distributed(cluster_2S_2R, db1, table1)
  13. Query id: 7c08e756-90cf-4014-9368-dc41fe7d06f4
  14. ┌─host──┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
  15. │ node4 │ 9000 │      0 │       │                   3 │                0
  16. │ node2 │ 9000 │      0 │       │                   2 │                0
  17. │ node1 │ 9000 │      0 │       │                   1 │                0
  18. │ node3 │ 9000 │      0 │       │                   0 │                0
  19. └───────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
  20. 4 rows in set. Elapsed: 0.095 sec. 
  21. vvml-yz-hbase-test.172.18.4.126 :)

(7)在所有实例上分别执行 SELECT 查询

        所有实例上都创建了 db1.dist_table 表,但是 SELECT *
FROM db1.dist_table 查询结果却出乎意料。预想在所有实例上查询的结果相同,都返回两个分片中的所有四行数据,而实际四个实例返回的结果是不确定的:node1 有时返回全部4行,有时只返回前两行;node3 有时返回全部4行,有时只返回后两行;node2 有时返回后两行,有时不返回数据;node4 有时返回前两行,有时不返回数据。

        说明:

  • ClickHouse 集群是一种对等架构,在一个集群里每个 ClickHouse 实例都是独立的,即使是同一个分片内的不同副本实例间,也是没有主从概念。
  • 在集群中任何一个实例上执行的 DDL 语句中使用 ON CLUSTER 子句,会自动在集群中的所有实例上执行。
  • 分布式表实际上是一种视图,映射到 ClickHouse 集群实例上的本地表。从分布式表中执行 SELECT 查询会使用集群所有相关分片的资源。
  • 底层本地表如果使用 MergeTree 表引擎,在分布式表上执行同一查询,返回结果有可能不确定。

4. 数据自动分片

        首先想到的是如果在分布式表上能够执行 DML 语句,就应该可以达到数据自动分片的效果。

  1. vvml-yz-hbase-test.172.18.4.126 :) insert into db1.dist_table
  2.     (id, column1)
  3. values
  4.     (5, 'mno'),
  5.     (6, 'pqr');
  6. INSERT INTO db1.dist_table (id, column1) FORMAT Values
  7. Query id: 78b4cc88-9b13-4078-a3f0-d405f4338cfb
  8. Elapsed: 0.002 sec. 
  9. Received exception from server (version 24.1.8):
  10. Code: 55. DB::Exception: Received from localhost:9000. DB::Exception: Method write is not supported by storage Distributed with more than one shard and no sharding key provided. (STORAGE_REQUIRES_PARAMETER)
  11. vvml-yz-hbase-test.172.18.4.126 :)

        报错明确指出,当数据存储分布在多于一个的分片上,并且没有提供分片键时,分布式表不支持写入。为了进一步演示,下面使用和创建 hits_v1 表类似的 CREATE TABLE 语句创建一个新的本地表,有三点不同:

  • 库名不同。
  • 表名不同。
  • 使用 ON CLUSTER 子句。

(1)创建带有分片键的本地表

        在任一实例上执行下面的建表语句:

  1. CREATE TABLE db1.hits_local ON CLUSTER 'cluster_2S_2R'
  2. (
  3.     `WatchID` UInt64,
  4.     `JavaEnable` UInt8,
  5.     `Title` String,
  6.     `GoodEvent` Int16,
  7.     `EventTime` DateTime,
  8.     `EventDate` Date,
  9.     `CounterID` UInt32,
  10.     `ClientIP` UInt32,
  11.     `ClientIP6` FixedString(16),
  12.     `RegionID` UInt32,
  13.     `UserID` UInt64,
  14.     `CounterClass` Int8,
  15.     `OS` UInt8,
  16.     `UserAgent` UInt8,
  17.     `URL` String,
  18.     `Referer` String,
  19.     `URLDomain` String,
  20.     `RefererDomain` String,
  21.     `Refresh` UInt8,
  22.     `IsRobot` UInt8,
  23.     `RefererCategories` Array(UInt16),
  24.     `URLCategories` Array(UInt16),
  25.     `URLRegions` Array(UInt32),
  26.     `RefererRegions` Array(UInt32),
  27.     `ResolutionWidth` UInt16,
  28.     `ResolutionHeight` UInt16,
  29.     `ResolutionDepth` UInt8,
  30.     `FlashMajor` UInt8,
  31.     `FlashMinor` UInt8,
  32.     `FlashMinor2` String,
  33.     `NetMajor` UInt8,
  34.     `NetMinor` UInt8,
  35.     `UserAgentMajor` UInt16,
  36.     `UserAgentMinor` FixedString(2),
  37.     `CookieEnable` UInt8,
  38.     `JavascriptEnable` UInt8,
  39.     `IsMobile` UInt8,
  40.     `MobilePhone` UInt8,
  41.     `MobilePhoneModel` String,
  42.     `Params` String,
  43.     `IPNetworkID` UInt32,
  44.     `TraficSourceID` Int8,
  45.     `SearchEngineID` UInt16,
  46.     `SearchPhrase` String,
  47.     `AdvEngineID` UInt8,
  48.     `IsArtifical` UInt8,
  49.     `WindowClientWidth` UInt16,
  50.     `WindowClientHeight` UInt16,
  51.     `ClientTimeZone` Int16,
  52.     `ClientEventTime` DateTime,
  53.     `SilverlightVersion1` UInt8,
  54.     `SilverlightVersion2` UInt8,
  55.     `SilverlightVersion3` UInt32,
  56.     `SilverlightVersion4` UInt16,
  57.     `PageCharset` String,
  58.     `CodeVersion` UInt32,
  59.     `IsLink` UInt8,
  60.     `IsDownload` UInt8,
  61.     `IsNotBounce` UInt8,
  62.     `FUniqID` UInt64,
  63.     `HID` UInt32,
  64.     `IsOldCounter` UInt8,
  65.     `IsEvent` UInt8,
  66.     `IsParameter` UInt8,
  67.     `DontCountHits` UInt8,
  68.     `WithHash` UInt8,
  69.     `HitColor` FixedString(1),
  70.     `UTCEventTime` DateTime,
  71.     `Age` UInt8,
  72.     `Sex` UInt8,
  73.     `Income` UInt8,
  74.     `Interests` UInt16,
  75.     `Robotness` UInt8,
  76.     `GeneralInterests` Array(UInt16),
  77.     `RemoteIP` UInt32,
  78.     `RemoteIP6` FixedString(16),
  79.     `WindowName` Int32,
  80.     `OpenerName` Int32,
  81.     `HistoryLength` Int16,
  82.     `BrowserLanguage` FixedString(2),
  83.     `BrowserCountry` FixedString(2),
  84.     `SocialNetwork` String,
  85.     `SocialAction` String,
  86.     `HTTPError` UInt16,
  87.     `SendTiming` Int32,
  88.     `DNSTiming` Int32,
  89.     `ConnectTiming` Int32,
  90.     `ResponseStartTiming` Int32,
  91.     `ResponseEndTiming` Int32,
  92.     `FetchTiming` Int32,
  93.     `RedirectTiming` Int32,
  94.     `DOMInteractiveTiming` Int32,
  95.     `DOMContentLoadedTiming` Int32,
  96.     `DOMCompleteTiming` Int32,
  97.     `LoadEventStartTiming` Int32,
  98.     `LoadEventEndTiming` Int32,
  99.     `NSToDOMContentLoadedTiming` Int32,
  100.     `FirstPaintTiming` Int32,
  101.     `RedirectCount` Int8,
  102.     `SocialSourceNetworkID` UInt8,
  103.     `SocialSourcePage` String,
  104.     `ParamPrice` Int64,
  105.     `ParamOrderID` String,
  106.     `ParamCurrency` FixedString(3),
  107.     `ParamCurrencyID` UInt16,
  108.     `GoalsReached` Array(UInt32),
  109.     `OpenstatServiceName` String,
  110.     `OpenstatCampaignID` String,
  111.     `OpenstatAdID` String,
  112.     `OpenstatSourceID` String,
  113.     `UTMSource` String,
  114.     `UTMMedium` String,
  115.     `UTMCampaign` String,
  116.     `UTMContent` String,
  117.     `UTMTerm` String,
  118.     `FromTag` String,
  119.     `HasGCLID` UInt8,
  120.     `RefererHash` UInt64,
  121.     `URLHash` UInt64,
  122.     `CLID` UInt32,
  123.     `YCLID` UInt64,
  124.     `ShareService` String,
  125.     `ShareURL` String,
  126.     `ShareTitle` String,
  127.     `ParsedParams` Nested(
  128.         Key1 String,
  129.         Key2 String,
  130.         Key3 String,
  131.         Key4 String,
  132.         Key5 String,
  133.         ValueDouble Float64),
  134.     `IslandID` FixedString(16),
  135.     `RequestNum` UInt32,
  136.     `RequestTry` UInt8
  137. )
  138. ENGINE = MergeTree()
  139. PARTITION BY toYYYYMM(EventDate)
  140. ORDER BY (CounterID, EventDate, intHash32(UserID))
  141. SAMPLE BY intHash32(UserID);

(2)创建提供集群本地表视图的分布式表

        在任一实例上执行下面的建表语句:

  1. CREATE TABLE db1.hits_all ON CLUSTER 'cluster_2S_2R'
  2. AS db1.hits_local
  3. ENGINE = Distributed(cluster_2S_2R, db1, hits_local, rand());

(3)向分布式表中插入数据

        在任一实例上执行下面的 insert 语句:

INSERT INTO db1.hits_all SELECT * FROM tutorial.hits_v1;

(4)在所有实例上分别执行 SELECT 查询

  1. # node1
  2. vvml-yz-hbase-test.172.18.4.126 :) select count(*) from db1.hits_local;
  3. SELECT count(*)
  4. FROM db1.hits_local
  5. Query id: 10a10594-dad1-4693-96fa-a27e62256cc0
  6. ┌─count()─┐
  7. 4437894
  8. └─────────┘
  9. 1 row in set. Elapsed: 0.002 sec. 
  10. vvml-yz-hbase-test.172.18.4.126 :) select count(*) from db1.hits_all;
  11. SELECT count(*)
  12. FROM db1.hits_all
  13. Query id: 4d825459-19fa-4f68-9326-cf6a9b7e113a
  14. ┌─count()─┐
  15. 8873898
  16. └─────────┘
  17. 1 row in set. Elapsed: 0.006 sec. 
  18. vvml-yz-hbase-test.172.18.4.126 :) 
  19. # node2
  20. vvml-yz-hbase-test.172.18.4.188 :) select count(*) from db1.hits_local;
  21. SELECT count(*)
  22. FROM db1.hits_local
  23. Query id: 7da679f4-4f0f-42e5-8c78-df6c9814c61f
  24. ┌─count()─┐
  25. 4437894
  26. └─────────┘
  27. 1 row in set. Elapsed: 0.003 sec. 
  28. vvml-yz-hbase-test.172.18.4.188 :) select count(*) from db1.hits_all;
  29. SELECT count(*)
  30. FROM db1.hits_all
  31. Query id: 29594d92-e38e-473a-bc91-7a5fb37e0647
  32. ┌─count()─┐
  33. 8873898
  34. └─────────┘
  35. 1 row in set. Elapsed: 0.005 sec. 
  36. vvml-yz-hbase-test.172.18.4.188 :) 
  37. # node3
  38. vvml-yz-hbase-test.172.18.4.71 :) select count(*) from db1.hits_local;
  39. SELECT count(*)
  40. FROM db1.hits_local
  41. Query id: 2e5a0fca-9b13-4c4e-8cb9-f9ee60da6152
  42. ┌─count()─┐
  43. 4436004
  44. └─────────┘
  45. 1 row in set. Elapsed: 0.002 sec. 
  46. vvml-yz-hbase-test.172.18.4.71 :) select count(*) from db1.hits_all;
  47. SELECT count(*)
  48. FROM db1.hits_all
  49. Query id: 85e6e7a4-eaab-4455-8c29-52ec0998a9b0
  50. ┌─count()─┐
  51. 8873898
  52. └─────────┘
  53. 1 row in set. Elapsed: 0.006 sec. 
  54. vvml-yz-hbase-test.172.18.4.71 :) 
  55. # node4
  56. vvml-yz-hbase-test.172.18.4.86 :) select count(*) from db1.hits_local;
  57. SELECT count(*)
  58. FROM db1.hits_local
  59. Query id: 9950210e-a4a4-4e1d-a449-4aefb2ea396f
  60. ┌─count()─┐
  61. 4436004
  62. └─────────┘
  63. 1 row in set. Elapsed: 0.002 sec. 
  64. vvml-yz-hbase-test.172.18.4.86 :) select count(*) from db1.hits_all;
  65. SELECT count(*)
  66. FROM db1.hits_all
  67. Query id: 45777141-ff08-452f-93b8-1650265d4175
  68. ┌─count()─┐
  69. 8873898
  70. └─────────┘
  71. 1 row in set. Elapsed: 0.009 sec. 
  72. vvml-yz-hbase-test.172.18.4.86 :)

        从查询结果可以看到,本地表 node1、node2 上的查询结果相同,node3、node4 上的查询结果相同,分布式表四个节点查询结果都相同。node1 和 node3 分属于集群中的不同分片,而在创建分布式表时指定的分布规则随机,所以这两个实例上的本地表数据量存在少许差异是符合预期的。结论是:

  • 指定分片键的分布式表可以写数据。
  • 数据按创建分布式表时指定的自定义的分片规则分布。
  • select count(*) from tutorial.hits_v1 用了2毫秒,select count(*) from db1.hits_all 用了6毫秒,在本测试环境中,查询分布式表比查询本地表慢了两倍。
  • 底层表如果使用 MergeTree 表引擎,其上正常定义了分布规则的分布式表,插入分布式表可以自动在同一分片上的不同副本之间做数据同步。

5. 使用 ReplicatedMergeTree 表引擎复制数据

        按通常的理解,一个分布式数据库至少需要满足以下基本需求:

  • 数据按指定规则自动分片
  • 同一分片的多个副本数据保持一致
  • 数据分片和多副本间的数据同步对应用透明。

        ClickHouse 推荐的配置是使用 ReplicatedMergeTree 表引擎,自动完成副本间的数据复制。创建 ReplcatedMergeTree 表,通常需要设置宏来识别每个用于创建表的分片和副本。

(1)定义宏

        在全部四台主机上执行下面的操作步骤。

        修改 /etc/clickhouse-server/config.xml 主配置文件,在每个节点 <shard> 下添加 <internal_replication>true</internal_replication>:

  1.     <remote_servers>
  2.         <cluster_2S_2R>
  3.             <shard>
  4.                 <internal_replication>true</internal_replication>
  5.                 <replica>
  6.                     <host>node1</host>
  7.                     <port>9000</port>
  8.                     <user>default</user>
  9.                     <password>123456</password>
  10.                 </replica>
  11.                 <replica>
  12.                     <host>node2</host>
  13.                     <port>9000</port>
  14.                     <user>default</user>
  15.                     <password>123456</password>
  16.                 </replica>
  17.             </shard>
  18.             <shard>
  19.                 <internal_replication>true</internal_replication>
  20.                 <replica>
  21.                     <host>node3</host>
  22.                     <port>9000</port>
  23.                     <user>default</user>
  24.                     <password>123456</password>
  25.                 </replica>
  26.                 <replica>
  27.                     <host>node4</host>
  28.                     <port>9000</port>
  29.                     <user>default</user>
  30.                     <password>123456</password>
  31.                 </replica>
  32.             </shard>
  33.         </cluster_2S_2R>
  34.     </remote_servers>

        在节点 </remote_servers> 后面添加以下内容:

  1. <macros>
  2.     <shard>01</shard>
  3.     <replica>01</replica>
  4. </macros>

        node1 配置成 01、01;node2 配置成 01、02;node3 配置成 02、01;node4 配置成 02、02。

        然后重启所有实例:

sudo service clickhouse-server restart

(2)创建本地表

        在任一实例上执行下面的建表语句:

  1. CREATE TABLE db1.hits_replica ON CLUSTER 'cluster_2S_2R'
  2. (
  3.     `WatchID` UInt64,
  4.     `JavaEnable` UInt8,
  5.     `Title` String,
  6.     `GoodEvent` Int16,
  7.     `EventTime` DateTime,
  8.     `EventDate` Date,
  9.     `CounterID` UInt32,
  10.     `ClientIP` UInt32,
  11.     `ClientIP6` FixedString(16),
  12.     `RegionID` UInt32,
  13.     `UserID` UInt64,
  14.     `CounterClass` Int8,
  15.     `OS` UInt8,
  16.     `UserAgent` UInt8,
  17.     `URL` String,
  18.     `Referer` String,
  19.     `URLDomain` String,
  20.     `RefererDomain` String,
  21.     `Refresh` UInt8,
  22.     `IsRobot` UInt8,
  23.     `RefererCategories` Array(UInt16),
  24.     `URLCategories` Array(UInt16),
  25.     `URLRegions` Array(UInt32),
  26.     `RefererRegions` Array(UInt32),
  27.     `ResolutionWidth` UInt16,
  28.     `ResolutionHeight` UInt16,
  29.     `ResolutionDepth` UInt8,
  30.     `FlashMajor` UInt8,
  31.     `FlashMinor` UInt8,
  32.     `FlashMinor2` String,
  33.     `NetMajor` UInt8,
  34.     `NetMinor` UInt8,
  35.     `UserAgentMajor` UInt16,
  36.     `UserAgentMinor` FixedString(2),
  37.     `CookieEnable` UInt8,
  38.     `JavascriptEnable` UInt8,
  39.     `IsMobile` UInt8,
  40.     `MobilePhone` UInt8,
  41.     `MobilePhoneModel` String,
  42.     `Params` String,
  43.     `IPNetworkID` UInt32,
  44.     `TraficSourceID` Int8,
  45.     `SearchEngineID` UInt16,
  46.     `SearchPhrase` String,
  47.     `AdvEngineID` UInt8,
  48.     `IsArtifical` UInt8,
  49.     `WindowClientWidth` UInt16,
  50.     `WindowClientHeight` UInt16,
  51.     `ClientTimeZone` Int16,
  52.     `ClientEventTime` DateTime,
  53.     `SilverlightVersion1` UInt8,
  54.     `SilverlightVersion2` UInt8,
  55.     `SilverlightVersion3` UInt32,
  56.     `SilverlightVersion4` UInt16,
  57.     `PageCharset` String,
  58.     `CodeVersion` UInt32,
  59.     `IsLink` UInt8,
  60.     `IsDownload` UInt8,
  61.     `IsNotBounce` UInt8,
  62.     `FUniqID` UInt64,
  63.     `HID` UInt32,
  64.     `IsOldCounter` UInt8,
  65.     `IsEvent` UInt8,
  66.     `IsParameter` UInt8,
  67.     `DontCountHits` UInt8,
  68.     `WithHash` UInt8,
  69.     `HitColor` FixedString(1),
  70.     `UTCEventTime` DateTime,
  71.     `Age` UInt8,
  72.     `Sex` UInt8,
  73.     `Income` UInt8,
  74.     `Interests` UInt16,
  75.     `Robotness` UInt8,
  76.     `GeneralInterests` Array(UInt16),
  77.     `RemoteIP` UInt32,
  78.     `RemoteIP6` FixedString(16),
  79.     `WindowName` Int32,
  80.     `OpenerName` Int32,
  81.     `HistoryLength` Int16,
  82.     `BrowserLanguage` FixedString(2),
  83.     `BrowserCountry` FixedString(2),
  84.     `SocialNetwork` String,
  85.     `SocialAction` String,
  86.     `HTTPError` UInt16,
  87.     `SendTiming` Int32,
  88.     `DNSTiming` Int32,
  89.     `ConnectTiming` Int32,
  90.     `ResponseStartTiming` Int32,
  91.     `ResponseEndTiming` Int32,
  92.     `FetchTiming` Int32,
  93.     `RedirectTiming` Int32,
  94.     `DOMInteractiveTiming` Int32,
  95.     `DOMContentLoadedTiming` Int32,
  96.     `DOMCompleteTiming` Int32,
  97.     `LoadEventStartTiming` Int32,
  98.     `LoadEventEndTiming` Int32,
  99.     `NSToDOMContentLoadedTiming` Int32,
  100.     `FirstPaintTiming` Int32,
  101.     `RedirectCount` Int8,
  102.     `SocialSourceNetworkID` UInt8,
  103.     `SocialSourcePage` String,
  104.     `ParamPrice` Int64,
  105.     `ParamOrderID` String,
  106.     `ParamCurrency` FixedString(3),
  107.     `ParamCurrencyID` UInt16,
  108.     `GoalsReached` Array(UInt32),
  109.     `OpenstatServiceName` String,
  110.     `OpenstatCampaignID` String,
  111.     `OpenstatAdID` String,
  112.     `OpenstatSourceID` String,
  113.     `UTMSource` String,
  114.     `UTMMedium` String,
  115.     `UTMCampaign` String,
  116.     `UTMContent` String,
  117.     `UTMTerm` String,
  118.     `FromTag` String,
  119.     `HasGCLID` UInt8,
  120.     `RefererHash` UInt64,
  121.     `URLHash` UInt64,
  122.     `CLID` UInt32,
  123.     `YCLID` UInt64,
  124.     `ShareService` String,
  125.     `ShareURL` String,
  126.     `ShareTitle` String,
  127.     `ParsedParams` Nested(
  128.         Key1 String,
  129.         Key2 String,
  130.         Key3 String,
  131.         Key4 String,
  132.         Key5 String,
  133.         ValueDouble Float64),
  134.     `IslandID` FixedString(16),
  135.     `RequestNum` UInt32,
  136.     `RequestTry` UInt8
  137. )
  138. ENGINE = ReplicatedMergeTree(
  139.     '/clickhouse/tables/{shard}/hits',
  140.     '{replica}'
  141. )
  142. PARTITION BY toYYYYMM(EventDate)
  143. ORDER BY (CounterID, EventDate, intHash32(UserID))
  144. SAMPLE BY intHash32(UserID);

(3)创建分布式表

        在任一实例上执行下面的建表语句:

  1. CREATE TABLE db1.hits_replica_all ON CLUSTER 'cluster_2S_2R'
  2. AS db1.hits_replica
  3. ENGINE = Distributed(cluster_2S_2R, db1, hits_replica, rand());

(4)向分布式表中插入数据

        在任一实例上执行下面的 insert 语句:

INSERT INTO db1.hits_replica_all SELECT * FROM tutorial.hits_v1;

(5)在所有实例上分别执行 SELECT 查询

  1. # node1
  2. vvml-yz-hbase-test.172.18.4.126 :) select count(*) from db1.hits_replica;
  3. SELECT count(*)
  4. FROM db1.hits_replica
  5. Query id: e4f08a9c-39be-48cf-a8f9-6caaa98b9fed
  6. ┌─count()─┐
  7. 4438089
  8. └─────────┘
  9. 1 row in set. Elapsed: 0.002 sec. 
  10. vvml-yz-hbase-test.172.18.4.126 :) select count(*) from db1.hits_replica_all;
  11. SELECT count(*)
  12. FROM db1.hits_replica_all
  13. Query id: 765f0ea5-a199-4bfb-85c4-6fd31e23f1af
  14. ┌─count()─┐
  15. 8873898
  16. └─────────┘
  17. 1 row in set. Elapsed: 0.007 sec. 
  18. vvml-yz-hbase-test.172.18.4.126 :) 
  19. # node2
  20. vvml-yz-hbase-test.172.18.4.188 :) select count(*) from db1.hits_replica;
  21. SELECT count(*)
  22. FROM db1.hits_replica
  23. Query id: b464df00-4c1a-4fea-8f2f-70fd0c32569f
  24. ┌─count()─┐
  25. 4438089
  26. └─────────┘
  27. 1 row in set. Elapsed: 0.002 sec. 
  28. vvml-yz-hbase-test.172.18.4.188 :) select count(*) from db1.hits_replica_all;
  29. SELECT count(*)
  30. FROM db1.hits_replica_all
  31. Query id: 19a77125-7e7e-4654-bb90-8884c3b0ed26
  32. ┌─count()─┐
  33. 8873898
  34. └─────────┘
  35. 1 row in set. Elapsed: 0.007 sec. 
  36. vvml-yz-hbase-test.172.18.4.188 :) 
  37. # node3
  38. vvml-yz-hbase-test.172.18.4.71 :) select count(*) from db1.hits_replica;
  39. SELECT count(*)
  40. FROM db1.hits_replica
  41. Query id: e021f13c-b980-4297-9c26-b94465503101
  42. ┌─count()─┐
  43. 4435809
  44. └─────────┘
  45. 1 row in set. Elapsed: 0.003 sec. 
  46. vvml-yz-hbase-test.172.18.4.71 :) select count(*) from db1.hits_replica_all;
  47. SELECT count(*)
  48. FROM db1.hits_replica_all
  49. Query id: 80ebbda2-be03-445c-8c16-f67428a45572
  50. ┌─count()─┐
  51. 8873898
  52. └─────────┘
  53. 1 row in set. Elapsed: 0.005 sec. 
  54. vvml-yz-hbase-test.172.18.4.71 :) 
  55. # node4
  56. vvml-yz-hbase-test.172.18.4.86 :) select count(*) from db1.hits_replica;
  57. SELECT count(*)
  58. FROM db1.hits_replica
  59. Query id: 204f483d-3fcb-4758-8df4-20e79f78e42e
  60. ┌─count()─┐
  61. 4435809
  62. └─────────┘
  63. 1 row in set. Elapsed: 0.002 sec. 
  64. vvml-yz-hbase-test.172.18.4.86 :) select count(*) from db1.hits_replica_all;
  65. SELECT count(*)
  66. FROM db1.hits_replica_all
  67. Query id: e3d28b81-50b7-46e6-a855-1b40ef9fe42a
  68. ┌─count()─┐
  69. 8873898
  70. └─────────┘
  71. 1 row in set. Elapsed: 0.007 sec. 
  72. vvml-yz-hbase-test.172.18.4.86 :)

        从查询结果可以看到,本地表 node1、node2 上的查询结果相同,node3、node4 上的查询结果相同,分布式表四个节点查询结果都相同。node1 和 node3 分属于集群中的不同分片,而在创建分布式表时指定的分布规则随机,所以这两个实例上的本地表数据量存在少许差异是符合预期的。ReplicatedMergeTree 表可以自动在同一分片的不同副本间同步数据。

参考:

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/很楠不爱3/article/detail/476842
推荐阅读
相关标签
  

闽ICP备14008679号