当前位置:   article > 正文

ssh 启动失败/反复重启,状态报:activing(start),timeout exceeding_sshd.service start operation timed out. terminatin

sshd.service start operation timed out. terminating.

一、问题描述

某次权限配置过程中,突然出现ssh断开,后查,ssh无法重启,状态异常,报超时断开,现场环境8.2版本:

polkitd[542]: Unregistered Authentication Agent for unix-process:6501:2207619775 (system bus name :1.1204804, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected from bus)
systemd: sshd.service start operation timed out. Terminating.
systemd: sshd.service start operation timed out. Terminating.
sshd[6508]: Received signal 15; terminating.
systemd: Failed to start OpenSSH server daemon.
systemd: Unit sshd.service entered failed state.
systemd: sshd.service failed.
systemd: Failed to start OpenSSH server daemon.
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

在这里插入图片描述

二、过程描述

1)检查日志报错如下:

在这里插入图片描述

#有经验表明服务调用异常,可尝试如下调试
mv /usr/lib/systemd/system/sshd.service /usr/lib/systemd/system/sshd.service.bak
systemctl daemon-reload
mv /usr/lib/systemd/system/sshd.service.bak /usr/lib/systemd/system/sshd.service
systemctl start sshd  
systemctl enable --now sshd.service
#因本次对/var/run进行过递归授权,检查/var目录权限
ll -d /var/  #现场755权限,可尝试744
drwxr-xr-x. 23 root root 4096 May 24  2022 /var/
ll /var/empty/
d--x--x--x. 2 root root 4096 Apr 15  2020 sshd
ll -d /etc/pki/
drwxr-xr-x. 10 root root 4096 Jul  2  2018 /etc/pki/
ll /etc/pki/
total 32
drwxr-xr-x. 6 root root 4096 Aug  9  2019 CA
drwxr-xr-x. 4 root root 4096 Jul  2  2018 ca-trust
drwxr-xr-x. 2 root root 4096 May 23  2022 java
drwxr-xr-x. 2 root root 4096 Feb 23  2020 nssdb
drwxr-xr-x. 2 root root 4096 Feb 20  2020 nss-legacy
drwxr-xr-x. 2 root root 4096 Jul  2  2018 rpm-gpg
drwx------. 2 root root 4096 Apr 11  2018 rsyslog
drwxr-xr-x. 5 root root 4096 May 23  2022 tls
#因重启Nginx过程中,使用semanager添加端口,怀疑临时selinux生效
getenforce
setenforce 0
#手动启动试下
/etc/rc.d/init.d/sshd start
/usr/sbin/sshd -f /etc/ssh/sshd_config
#测试配置文件正常
sshd -t
#自启动脚本参考
#!/bin/bash
#
# Init file for OpenSSH server daemon
#
# chkconfig: 2345 55 25
# description: OpenSSH server daemon
#
# processname: sshd
# config: /etc/ssh/ssh_host_key
# config: /etc/ssh/ssh_host_key.pub
# config: /etc/ssh/ssh_random_seed
# config: /etc/ssh/sshd_config
# pidfile: /var/run/sshd.pid

# source function library
. /etc/rc.d/init.d/functions

# pull in sysconfig settings
[ -f /etc/sysconfig/sshd ] && . /etc/sysconfig/sshd

RETVAL=0
prog="sshd"

# Some functions to make the below more readable
SSHD=/usr/sbin/sshd
PID_FILE=/var/run/sshd.pid

do_restart_sanity_check()
{
        $SSHD -t
        RETVAL=$?
        if [ $RETVAL -ne 0 ]; then
                failure $"Configuration file or keys are invalid"
                echo
        fi
}

start()
{
        # Create keys if necessary
        /usr/bin/ssh-keygen -A
        if [ -x /sbin/restorecon ]; then
                /sbin/restorecon /etc/ssh/ssh_host_rsa_key.pub
                /sbin/restorecon /etc/ssh/ssh_host_dsa_key.pub
                /sbin/restorecon /etc/ssh/ssh_host_ecdsa_key.pub
        fi

        echo -n $"Starting $prog:"
        $SSHD $OPTIONS && success || failure
        RETVAL=$?
        [ $RETVAL -eq 0 ] && touch /var/lock/subsys/sshd
        echo
}

stop()
{
        echo -n $"Stopping $prog:"
        killproc $SSHD -TERM
        RETVAL=$?
        [ $RETVAL -eq 0 ] && rm -f /var/lock/subsys/sshd
        echo
}

reload()
{
        echo -n $"Reloading $prog:"
        killproc $SSHD -HUP
        RETVAL=$?
        echo
}

case "$1" in
        start)
                start
                ;;
        stop)
                stop
                ;;
        restart)
                stop
                start
                ;;
        reload)
                reload
                ;;
        condrestart)
                if [ -f /var/lock/subsys/sshd ] ; then
                        do_restart_sanity_check
                        if [ $RETVAL -eq 0 ] ; then
                                stop
                                # avoid race
                                sleep 3
                                start
                        fi
                fi
                ;;
        status)
                status $SSHD
                RETVAL=$?
                ;;
        *)
                echo $"Usage: $0 {start|stop|restart|reload|condrestart|status}"
                RETVAL=1
esac
exit $RETVAL
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
  • 134
  • 135
  • 136
  • 137

2)前一天晚上,ssh无论怎样都无法正常,启动也显示是/usr/sbin/sshd -D [listener] 0 of 10-100 startups,状态始终显示超时,有意思的是,第2天查看的时候,重新启动好了

在这里插入图片描述
后来同样的问题发现,弃用systemd下ssh.service后,重启会采用sshd启动脚本启动,正常:

在这里插入图片描述
原来启动状态如下:

在这里插入图片描述
注:相关经验表明,可编辑makefile文件,配置变量LLIBS,最后增加 -lsystemd,如这样:IBS=-lcrypto -ldl -lutil -lz -lcrypt -lresolv -lsystemd,然后重新编译即可。即默认如果不加的话,用systemd管理启动服务有问题。如果sshd或系统动态库存在文件缺失、损坏、权限错误等问题,就会导致sshd启动失败或冲突,重启。

3)附录:关于polkit,

polkit 是Linux中一个应用程序级别的工具集,用于身份认证管理 (Authorization Manager ),通过定义和审核权限规则,实现不同优先级进程间的通讯:控制决策集中在统一的框架之中,决定低优先级进程是否有权访问高优先级进程。

Polkit 在系统层级进行权限控制,提供了一个低优先级进程和高优先级进程进行通讯的系统。和 sudo 等程序不同,Polkit 并没有赋予进程完全的 root 权限,而是通过一个集中的策略系统进行更精细的授权。

Polkit 定义出一系列操作,例如运行 GParted, 并将用户按照群组或用户名进行划分,例如 wheel 群组用户。了解linux 权限, 然后定义每个操作是否可以由某些用户执行,执行操作前是否需要一些额外的确认,例如通过输入密码确认用户是不是属于某个群组。

polkit在启动一些服务时,有可能会遇到polkit不能正常启动运行的情况,会报出以下错误:

Authorization not available. Checkif polkit service is running or see debug mes
  • 1

可查看polkit的运行状态发现是failed,尝试重启:

#确认用户名和组名
cat /etc/passwd|grep polkit
polkitd:x:999:998:User for polkitd:/:/sbin/nologin
#查看服务状态
systemctl status polkit
● polkit.service - Authorization Manager
   Loaded: loaded (/usr/lib/systemd/system/polkit.service; static; vendor preset: enabled)
   Active: active (running) since Tue 2022-05-24 10:43:46 CST; 8 months 12 days ago
     Docs: man:polkit(8)
 Main PID: 542 (polkitd)
   CGroup: /system.slice/polkit.service
           └─542 /usr/lib/polkit-1/polkitd --no-debug

Feb 04 00:48:05 Yangguang-011 polkitd[542]: Registered Authentication Agent for unix-process:26570:2208281632 (system bus name :1.1205214 ...S.UTF-8)
Feb 04 00:49:13 Yangguang-011 polkitd[542]: Unregistered Authentication Agent for unix-process:26570:2208281632 (system bus name :1.120521...rom bus)
Feb 04 00:51:42 Yangguang-011 polkitd[542]: Registered Authentication Agent for unix-process:27093:2208303356 (system bus name :1.1205227 ...S.UTF-8)
Feb 04 00:51:42 Yangguang-011 polkitd[542]: Unregistered Authentication Agent for unix-process:27093:2208303356 (system bus name :1.120522...rom bus)
Feb 04 00:54:09 Yangguang-011 polkitd[542]: Registered Authentication Agent for unix-process:27680:2208318010 (system bus name :1.1205240 ...S.UTF-8)
Feb 04 00:55:33 Yangguang-011 polkitd[542]: Unregistered Authentication Agent for unix-process:27680:2208318010 (system bus name :1.120524...rom bus)
Feb 04 10:21:39 Yangguang-011 polkitd[542]: Registered Authentication Agent for unix-process:2065:2211723054 (system bus name :1.1207542 [...S.UTF-8)
Feb 04 10:21:39 Yangguang-011 polkitd[542]: Unregistered Authentication Agent for unix-process:2065:2211723054 (system bus name :1.1207542...rom bus)
Feb 04 10:55:40 Yangguang-011 polkitd[542]: Registered Authentication Agent for unix-process:8172:2211927116 (system bus name :1.1207682 [...S.UTF-8)
Feb 04 10:55:40 Yangguang-011 polkitd[542]: Unregistered Authentication Agent for unix-process:8172:2211927116 (system bus name :1.1207682...rom bus)
Hint: Some lines were ellipsized, use -l to show in full.

systemctl start polkit.service  #重启
/usr/lib/polkit-1/polkitd --no-debug &  #手动重启
ll /usr/lib/polkit-1/polkitd
-rwxr-xr-x. 1 root root 120432 Jan 26  2022 /usr/lib/polkit-1/polkitd

#检查dbus服务状态
systemctl status dbus.service
● dbus.service - D-Bus System Message Bus
   Loaded: loaded (/usr/lib/systemd/system/dbus.service; static; vendor preset: disabled)
   Active: active (running) since Tue 2022-05-24 10:43:46 CST; 8 months 12 days ago
     Docs: man:dbus-daemon(1)
 Main PID: 553 (dbus-daemon)
   CGroup: /system.slice/dbus.service
           └─553 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation

Jul 08 22:04:26 Yangguang-011 dbus[553]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.free...service'
Jul 08 22:04:26 Yangguang-011 dbus[553]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Jul 18 10:28:48 Yangguang-011 dbus[553]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.free...service'
Jul 18 10:28:48 Yangguang-011 dbus[553]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Jul 23 11:52:17 Yangguang-011 dbus[553]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.free...service'
Jul 23 11:52:17 Yangguang-011 dbus[553]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
Hint: Some lines were ellipsized, use -l to show in full.

systemctl restart dbus.service  #异常的话重启
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50

4)附录:关于sshd服务配置

相关经验表明,出现报错:sshd.service holdoff time over, scheduling restart. 是因为ssh启动后,没有给systemd发消息,systemd就一直等,超时后就重启ssh,导致ssh频繁挂起,但未启动成功,虽然有时看似并不影响登陆使用

处理:修改源码,在源码openssh-8.2p1目录下,在sshd.c这个主函数文件,找到调用server_accept_loop 这个函数的行,增加sd_notify(0, “READY=1”);行,完成后,相应的在源文件开头几行添加引用头文件:#include <systemd/sd-daemon.h>;

/* Signal systemd that we are ready to accept connections */
sd_notify(0, "READY=1");
/* Accept a connection and return in a forked child */
server_accept_loop(&sock_in, &sock_out,&newsock, config_s);
  • 1
  • 2
  • 3
  • 4

完成后编译安装:

#默认的依赖中,不包含sd_notify 这个函数,所以这里需要安装依赖的包

yum install systemd-devel
#修改预编译文件makefile,找到变量 LIBS,增加-lsystemd,修改如下:
LIBS=-lcrypto -ldl -lutil -lz -lcrypt -lresolv -lsystemd
#然后
make & make install
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

或直接删除旧的sshd.service;使用脚本重新生成sshd服务;

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/你好赵伟/article/detail/158162?site
推荐阅读
相关标签
  

闽ICP备14008679号