赞
踩
昨天在某客户环境进行CDH Hadoop的安装,安装还算比较顺利,但在启动Cloudera SCM Server和Agent服务的时候均启动失败。
[root@YXnode01 ~]# service cloudera-scm-server restart
Restarting cloudera-scm-server (via systemctl): Job for cloudera-scm-server.service failed because the control process exited with error code. See "systemctl status cloudera-scm-server.service" and "journalctl -xe" for details.
[FAILED]
根据上述提示信息,我们执行"systemctl status cloudera-scm-server.service"查看详细错误信息如下,
[root@YXnode01 ~]# systemctl status cloudera-scm-server.service ● cloudera-scm-server.service - LSB: Cloudera SCM Server Loaded: loaded (/etc/rc.d/init.d/cloudera-scm-server; bad; vendor preset: disabled) Active: failed (Result: exit-code) since Tue 2019-11-05 09:25:49 CST; 3min 32s ago Docs: man:systemd-sysv-generator(8) Process: 15982 ExecStart=/etc/rc.d/init.d/cloudera-scm-server start (code=exited, status=1/FAILURE) Nov 05 09:25:44 YXnode01.esgyn.cn systemd[1]: Starting LSB: Cloudera SCM Server... Nov 05 09:25:44 YXnode01.esgyn.cn su[16015]: pam_unix(su:auth): auth could not identify password for [cloudera-scm] Nov 05 09:25:44 YXnode01.esgyn.cn su[16015]: pam_succeed_if(su:auth): requirement "uid >= 1000" not met by user "cloudera-scm" Nov 05 09:25:46 YXnode01.esgyn.cn su[16015]: FAILED SU (to cloudera-scm) root on none Nov 05 09:25:49 YXnode01.esgyn.cn cloudera-scm-server[15982]: Starting cloudera-scm-server: [FAILED] Nov 05 09:25:49 YXnode01.esgyn.cn systemd[1]: cloudera-scm-server.service: control process exited, code=exited status=1 Nov 05 09:25:49 YXnode01.esgyn.cn systemd[1]: Failed to start LSB: Cloudera SCM Server. Nov 05 09:25:49 YXnode01.esgyn.cn systemd[1]: Unit cloudera-scm-server.service entered failed state. Nov 05 09:25:49 YXnode01.esgyn.cn systemd[1]: cloudera-scm-server.service failed.
顺便查看Cloudera SCM Server的日志,内容如下,
[root@YXnode01 ~]# tail -10f /var/log/cloudera-scm-server/cloudera-scm-server.out
Password: su: Error in service module
检查Hadoop节点的selinux、防火墙、ssh等这些均正常,根据以上具体错误“pam_succeed_if(su:auth): requirement “uid >= 1000” not met by user “cloudera-scm””,我们怀疑可能是linux系统有什么特殊的安全策略,网上搜索一番找到阿里的一篇文章https://help.aliyun.com/knowledge_detail/41491.html?spm=a2c6h.13066369.0.0.2edd1479fTjQLg
根据上述文章内容,我们从目录/etc/pam.d下面搜索’uid >= 1000’相关内容,找到以下配置文件。
[root@YXnode01 pam.d]# grep 'uid >= 1000' *
password-auth:auth requisite pam_succeed_if.so uid >= 1000 quiet_success
password-auth-ac:auth requisite pam_succeed_if.so uid >= 1000 quiet_success
system-auth:auth requisite pam_succeed_if.so uid >= 1000 quiet_success
system-auth-ac:auth requisite pam_succeed_if.so uid >= 1000 quiet_success
[root@YXnode01 pam.d]# pwd
/etc/pam.d
于是我们注释掉上述相关的内容然后重试尝试启动SCM Server服务, 发现仍然启动失败,但报错信息略有不同,之前的错误pam_succeed_if(su:auth): requirement “uid >= 1000” not met by user "cloudera-scm"已经不存在,报错信息变成了FAILED SU (to cloudera-scm) root on none。
[root@YXnode01 ~]# systemctl status cloudera-scm-server.service
● cloudera-scm-server.service - LSB: Cloudera SCM Server
Loaded: loaded (/etc/rc.d/init.d/cloudera-scm-server; bad; vendor preset: disabled)
Active: failed (Result: exit-code) since Tue 2019-11-05 09:59:37 CST; 17s ago
Docs: man:systemd-sysv-generator(8)
Process: 17469 ExecStart=/etc/rc.d/init.d/cloudera-scm-server start (code=exited, status=1/FAILURE)
Nov 05 09:59:32 YXnode01.esgyn.cn systemd[1]: Starting LSB: Cloudera SCM Server...
Nov 05 09:59:32 YXnode01.esgyn.cn su[17502]: pam_unix(su:auth): auth could not identify password for [cloudera-scm]
Nov 05 09:59:34 YXnode01.esgyn.cn su[17502]: FAILED SU (to cloudera-scm) root on none
Nov 05 09:59:37 YXnode01.esgyn.cn cloudera-scm-server[17469]: Starting cloudera-scm-server: [FAILED]
Nov 05 09:59:37 YXnode01.esgyn.cn systemd[1]: cloudera-scm-server.service: control process exited, code=exited status=1
Nov 05 09:59:37 YXnode01.esgyn.cn systemd[1]: Failed to start LSB: Cloudera SCM Server.
Nov 05 09:59:37 YXnode01.esgyn.cn systemd[1]: Unit cloudera-scm-server.service entered failed state.
Nov 05 09:59:37 YXnode01.esgyn.cn systemd[1]: cloudera-scm-server.service failed.
原来,使用root用户直接执行service cloudera-scm-server start时,内部会先切换到cloudera-scm用户进行启动,即启动时先执行su cloudera-scm命令。
于是我们检查从root切换到cloudea-scm用户,并在其他正常的环境中做同样的测试。我们发现在此环境里面root执行su cloudera-scm时会提示需要输入password,但在正常的环境中不需要。
[root@YXnode01 ~]# su cloudera-scm
Password:
根据此信息,我们进一步搜索到需要检查/etc/pam.d/su文件,于是我们对比了此环境和正常环境中的/etc/pam.d/su文件,区别如下图所示,
在此环境中,上述文件多出一行,我们按照正常环境中的配置注释掉上述这一行,然后重新启动SCM Server服务,现在能够正常启动。
[root@YXnode01 ~]# service cloudera-scm-server status
● cloudera-scm-server.service - LSB: Cloudera SCM Server
Loaded: loaded (/etc/rc.d/init.d/cloudera-scm-server; bad; vendor preset: disabled)
Active: active (exited) since Tue 2019-11-05 11:29:54 CST; 15s ago
Docs: man:systemd-sysv-generator(8)
Process: 19790 ExecStart=/etc/rc.d/init.d/cloudera-scm-server start (code=exited, status=0/SUCCESS)
Nov 05 11:29:49 YXnode01.esgyn.cn systemd[1]: Starting LSB: Cloudera SCM Server...
Nov 05 11:29:49 YXnode01.esgyn.cn su[19823]: (to cloudera-scm) root on none
Nov 05 11:29:54 YXnode01.esgyn.cn cloudera-scm-server[19790]: Starting cloudera-scm-server: [ OK ]
Nov 05 11:29:54 YXnode01.esgyn.cn systemd[1]: Started LSB: Cloudera SCM Server.
再来研究一下,
auth required pam_wheel.so group=wheel,表示禁止非wheel组用户切换到root。
在Linux中为了更进一步加强系统的安全性,很有必要建立了一个管理员的组,只允许这个组的用户来执行“su -”命令登录为root用户,而让其他组的用户即使执行“su -”、输入了正确的root密码,也无法登录为root用户。在UNIX和Linux下,这个组的名称通常为“wheel”。而这个是在配置文件/etc/pam.d/su里面配置的。因此,这一个配置加到su文件里面,就导致了cloudera-scm用户与root无法进行su切换,除非把cloudera-scm用户加到wheel组。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。