赞
踩
目录
【DataSophon】大数据管理平台DataSophon-1.2.1安装部署详细流程-CSDN博客
【DataSophon】大数据服务组件之Flink升级_datasophon1.2 升级flink-CSDN博客
【DataSophon】大数据管理平台DataSophon-1.2.1基本使用-CSDN博客
DataSophon也是个类似的管理平台,只不过与智子不同的是,智子的目的是锁死人类的基础科学阻碍人类技术爆炸,而DataSophon是致力于自动化监控、运维、管理大数据基础组件和节点的,帮助您快速构建起稳定,高效的大数据集群服务。
主要特性有:
官方地址:DataSophon | DataSophon
GITHUB地址:datasophon/README_CN.md at dev · datavane/datasophon
为设计出轻量级,高性能,高可扩的,可满足国产化环境要求的大数据集群管理平台。需满足以下设计要求:
(1)一次编译,处处运行,项目部署仅依赖java环境,无其他系统环境依赖。
(2)DataSophon工作端占用资源少,不占用大数据计算节点资源。
(3)可扩展性高,可通过配置的方式集成托管第三方组件。
各集成组件均进行过兼容性测试,并稳定运行于300+个节点规模的大数据集群,日处理数据量约4000亿条。在海量数据下,各大数据组件调优成本低,平台默认展示用户关心和需要调优的配置。如下DDP1.2.1服务版本信息:
序号 | 名称 | 版本 | 描述 |
1 | HDFS | 3.3.3 | 分布式大数据存储 |
2 | YARN | 3.3.3 | 分布式资源调度与管理平台 |
3 | 3.5.10 | 分布式协调系统 | |
4 | FLINK | 1.16.2 | 实时计算引擎 |
5 | DolphoinScheduler | 3.1.8 | 分布式易扩展的可视化工作流任务调度平台 |
6 | StreamPark | 2.1.1 | 流处理极速开发框架,流批一体&湖仓一体的云原生平台 |
7 | Spark | 3.1.3 | 分布式计算系统 |
8 | Hive | 3.1.0 | 离线数据仓库 |
9 | Kafka | 2.4.1 | 高吞吐量分布式发布订阅消息系统 |
10 | Trino | 367 | 分布式Sql交互式查询引擎 |
11 | Doris | 1.2.6 | 新一代极速全场景MPP数据库 |
12 | 2.4.16 | 分布式列式存储数据库 | |
13 | Ranger | 2.1.0 | 权限控制框架 |
14 | ElasticSearch | 7.16.2 | 高性能搜索引擎 |
15 | Prometheus | 2.17.2 | 高性能监控指标采集与告警系统 |
16 | Grafana | 9.1.6 | 监控分析与数据可视化套件 |
17 | AlertManager | 0.23.0 | 告警通知管理系统 |
IP | 主机名 |
192.168.2.115 | ddp01 |
192.168.2.116 | ddp02 |
192.168.2.120 | ddp03 |
选择kerberos服务
选择 KDC和 Kadmin安装节点
选择Client安装节点
域名使用默认的 HADOOP.COM ,如需修为改自定义 则需要比如在HBase zk-jaas.ftl 改为传参,因为他默认是写死为 HADOOP.COM。
如下是datasophon-woker ,下的 zk-jaas.ftl默认配置
修改后的
-
- <#assign realm = zkRealm!"">
- <#if realm?has_content>
- Client {
- com.sun.security.auth.module.Krb5LoginModule required
- useKeyTab=true
- keyTab="/etc/security/keytab/hbase.keytab"
- useTicketCache=false
- principal="hbase/${host}@${realm}";
- };
- </#if>
安装完成
enableKerberos 点击开启即可后重启即可。
如下图 按钮开启kerberos
开启之后我们需要重启HDFS服务,全部重启可能会报错,我们可以对服务内部的角色逐个重启。
开启
HBase2.4.16 开启Kerberos后 HMaster启动报错
- 2024-06-12 10:40:46,421 ERROR [main] regionserver.HRegionServer: Failed construction RegionServer
- org.apache.hadoop.hbase.ZooKeeperConnectionException: master:16000-0x1000ff4d1270007, quorum=ddp01:2181,ddp02:2181,ddp03:2181, baseZNode=/hbase Unexpected KeeperException creating base node
- at org.apache.hadoop.hbase.zookeeper.ZKWatcher.createBaseZNodes(ZKWatcher.java:258)
- at org.apache.hadoop.hbase.zookeeper.ZKWatcher.<init>(ZKWatcher.java:182)
- at org.apache.hadoop.hbase.zookeeper.ZKWatcher.<init>(ZKWatcher.java:135)
- at org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:662)
- at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:425)
- at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
- at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
- at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
- at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
- at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2926)
- at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:247)
- at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:145)
- at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
- at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:140)
- at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2946)
- Caused by: org.apache.zookeeper.KeeperException$InvalidACLException: KeeperErrorCode = InvalidACL for /hbase
- at org.apache.zookeeper.KeeperException.create(KeeperException.java:128)
- at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
- at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:1538)
- at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:525)
- at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:505)
- at org.apache.hadoop.hbase.zookeeper.ZKUtil.createWithParents(ZKUtil.java:874)
- at org.apache.hadoop.hbase.zookeeper.ZKUtil.createWithParents(ZKUtil.java:856)
- at org.apache.hadoop.hbase.zookeeper.ZKWatcher.createBaseZNodes(ZKWatcher.java:249)
- ... 14 more
- 2024-06-12 10:40:46,424 ERROR [main] master.HMasterCommandLine: Master exiting
问题解决:没有使用默认的HADOOP域修改为自定义之后有的配置文件无法完成动态更新,包括配置页面也是如此需要手动更新,那我之后也就采用了默认的 HADOOP域。
zk-jaas.conf
或者可以修改zk_jaas.ftl安装文件
-
- <#assign realm = zkRealm!"">
- <#if realm?has_content>
- Client {
- com.sun.security.auth.module.Krb5LoginModule required
- useKeyTab=true
- keyTab="/etc/security/keytab/hbase.keytab"
- useTicketCache=false
- principal="hbase/${host}@${realm}";
- };
- </#if>
修改后接着报错如下:
- 2024-06-14 15:07:23,137 INFO [Thread-24] hdfs.DataStreamer: Exception in createBlockOutputStream
- java.io.IOException: Invalid token in javax.security.sasl.qop: D
- at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessage(DataTransferSaslUtil.java:220)
- at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:553)
- at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getSaslStreams(SaslDataTransferClient.java:455)
- at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:298)
- at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:245)
- at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:203)
- at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:193)
- at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1705)
- at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1655)
- at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:710)
- 2024-06-14 15:07:23,138 WARN [Thread-24] hdfs.DataStreamer: Abandoning BP-268430271-192.168.2.115-1717646949749:blk_1073742094_1293
- 2024-06-14 15:07:23,159 WARN [Thread-24] hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[192.168.2.120:1026,DS-3dec681b-fa1e-4bdc-89bf-0fd969f7ddbe,DISK]
- 2024-06-14 15:07:23,187 INFO [Thread-24] hdfs.DataStreamer: Exception in createBlockOutputStream
- java.io.IOException: Invalid token in javax.security.sasl.qop:
查询GitHub看是HBase 开启kerberos 版本不兼容 ,启动有问题
Apache HBase – Apache HBase Downloads
查看 Hadoop与HBase的版本兼容性:Apache HBase® Reference Guide
[Bug] [Module Name] v2.4.16 hbase in kerberos did't run · Issue #445 · datavane/datasophon · GitHub
如上测试 使用 HBase2.4.16 那我们换成2.0.2 再次测试安装 ,HBase 版本换为2.0.2 需要修改启动脚本。
bin/hbase-daemon.sh
增加监控信息,至于修改脚本的具体信息我们可以参考HBase 2.4.16 的写法修改即可。
bin/hbase
修改完成后,我们再次安装 HBase 并开启kerberos成功,至于高版本的兼容性 我们后面再研究。
NodeManager启动报错如下:
- 2024-06-12 09:23:55,033 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor: Shell execution returned exit code: 127. Privileged Execution Operation Stderr:
- /opt/datasophon/hadoop-3.3.3/bin/container-executor: error while loading shared libraries: libcrypto.so.1.1: cannot open shared object file: No such file or directory
-
- Stdout:
- Full command array for failed execution:
- [/opt/datasophon/hadoop-3.3.3/bin/container-executor, --checksetup]
- 2024-06-12 09:23:55,034 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container executor initialization is : 127
- org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: ExitCodeException exitCode=127: /opt/datasophon/hadoop-3.3.3/bin/container-executor: error while loading shared libraries: libcrypto.so.1.1: cannot open shared object file: No such file or directory
-
- at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:182)
- at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:208)
- at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:330)
- at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:403)
- at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
- at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:962)
- at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1042)
- Caused by: ExitCodeException exitCode=127: /opt/datasophon/hadoop-3.3.3/bin/container-executor: error while loading shared libraries: libcrypto.so.1.1: cannot open shared object file: No such file or directory
执行命令报错如下:
/opt/datasophon/hadoop-3.3.3/bin/container-executor
那我们就安装openssl
- # 下载libcrypto.so.1.1o.tar.gz 执行如下命令
- cd ~
- wget https://www.openssl.org/source/openssl-1.1.1o.tar.gz
-
- # 解压libcrypto.so.1.1o.tar.gz 执行如下命令
- sudo tar -zxvf openssl-1.1.1o.tar.gz
-
- # 切换到解压好的openssl-1.1.1o目录下
- cd openssl-1.1.1o
-
- #编译安装
- sudo ./config
- sudo make
- sudo make install
既然有这个库,那就好办了,把它创建一下软链到/usr/lib64的路径中
ln -s /usr/local/lib64/libcrypto.so.1.1 /usr/lib64/libcrypto.so.1.1
再次启动NodeManager
认证过后,运行任务可能报HDFS访问的权限问题:
我们修改下权限
- kinit -kt /etc/security/keytab/nn.service.keytab nn/ddp01@WINNER.COM
- hdfs dfs -chmod -R 777 /tmp/hadoop-yarn/
开启 Kerberos
HiveServer2有warning信息 我们不管
- 2024-06-21 15:05:51 INFO HiveServer2:877 - Stopping/Disconnecting tez sessions.
- 2024-06-21 15:05:51 INFO HiveServer2:883 - Stopped tez session pool manager.
- 2024-06-21 15:05:51 WARN HiveServer2:1064 - Error starting HiveServer2 on attempt 1, will retry in 60000ms
- java.lang.NoClassDefFoundError: org/apache/tez/dag/api/TezConfiguration
- at org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession$AbstractTriggerValidator.startTriggerValidator(TezSessionPoolSession.java:74)
- at org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.initTriggers(TezSessionPoolManager.java:207)
- at org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.startPool(TezSessionPoolManager.java:114)
- at org.apache.hive.service.server.HiveServer2.initAndStartTezSessionPoolManager(HiveServer2.java:839)
- at org.apache.hive.service.server.HiveServer2.startOrReconnectTezSessions(HiveServer2.java:822)
- at org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:745)
- at org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:1037)
- at org.apache.hive.service.server.HiveServer2.access$1600(HiveServer2.java:140)
- at org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:1305)
- at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:1149)
- at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
- at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
- at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
- at java.lang.reflect.Method.invoke(Method.java:498)
- at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
- at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
- Caused by: java.lang.ClassNotFoundException: org.apache.tez.dag.api.TezConfiguration
- at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
- at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
- at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
- at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
- ... 16 more
命令行show databases报错如下:
- 2024-06-25 09:26:17 WARN ThriftCLIService:795 - Error fetching results:
- org.apache.hive.service.cli.HiveSQLException: java.io.IOException: java.io.IOException: Can't get Master Kerberos principal for use as renewer
- at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:465)
- at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:309)
- at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:905)
- at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:561)
- at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:786)
- at org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1837)
- at org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1822)
- at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
- at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
- at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:647)
- at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
- at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
- at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
- at java.lang.Thread.run(Thread.java:750)
- Caused by: java.io.IOException: java.io.IOException: Can't get Master Kerberos principal for use as renewer
- at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:602)
- at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:509)
- at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)
- at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2691)
- at org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
- at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:460)
- ... 13 more
- Caused by: java.io.IOException: Can't get Master Kerberos principal for use as renewer
- at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:134)
- at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:102)
- at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:81)
- at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:221)
- at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:332)
- at org.apache.hadoop.hive.ql.exec.FetchOperator.generateWrappedSplits(FetchOperator.java:425)
- at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:395)
- at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:314)
- at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:540)
- ... 18 more
执行 insert 语句报错如下:
- 2024-06-25 14:20:56,454 Stage-1 map = 0%, reduce = 0%
- 2024-06-25 14:20:56 INFO Task:1224 - 2024-06-25 14:20:56,454 Stage-1 map = 0%, reduce = 0%
- 2024-06-25 14:20:56 WARN Counters:235 - Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
- Ended Job = job_1719295126228_0001 with errors
- 2024-06-25 14:20:56 ERROR Task:1247 - Ended Job = job_1719295126228_0001 with errors
- Error during job, obtaining debugging information...
- 2024-06-25 14:20:56 ERROR Task:1247 - Error during job, obtaining debugging information...
- 2024-06-25 14:20:57 INFO YarnClientImpl:504 - Killed application application_1719295126228_0001
- 2024-06-25 14:20:57 INFO ReOptimizePlugin:70 - ReOptimization: retryPossible: false
- FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
- 2024-06-25 14:20:57 ERROR Driver:1247 - FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
- MapReduce Jobs Launched:
- 2024-06-25 14:20:57 INFO Driver:1224 - MapReduce Jobs Launched:
- 2024-06-25 14:20:57 WARN Counters:235 - Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
- Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 FAIL
- 2024-06-25 14:20:57 INFO Driver:1224 - Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 FAIL
- Total MapReduce CPU Time Spent: 0 msec
- 2024-06-25 14:20:57 INFO Driver:1224 - Total MapReduce CPU Time Spent: 0 msec
- 2024-06-25 14:20:57 INFO Driver:2531 - Completed executing command(queryId=hdfs_20240625142022_e1ec801e-ee1d-411e-b96d-d4141b5e2918); Time taken: 33.984 seconds
- 2024-06-25 14:20:57 INFO Driver:285 - Concurrency mode is disabled, not creating a lock manager
- 2024-06-25 14:20:57 INFO HiveConf:5034 - Using the default value passed in for log id: 202267cf-76bc-43ff-8545-67bd0b4d73ce
- 2024-06-25 14:20:57 INFO SessionState:449 - Resetting thread name to main
如上问题是 没有安装yarn 服务,大意了,这种服务组件强依赖的检查机制还需要升级一下。
DDP集成rananger:
datasophon集成rangerusersync · 语雀
DDP集成Hue:
zookeeper、hbase集成kerberos - 北漂-boy - 博客园
Kerberos安全认证-连载11-HBase Kerberos安全配置及访问_kerberos hbase(4)_hbase2.2.6适配java版本-CSDN博客
HBase 官网:Apache HBase® Reference Guide
Kerberos安全认证-连载11-HBase Kerberos安全配置及访问_kerberos hbase(4)_hbase 配置 kerberos 认证-CSDN博客
/opt/datasophon/hadoop-3.3.3/ranger-hdfs-plugin/enable-hdfs-plugin.sh
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。