当前位置:   article > 正文

kubelet 证书轮换失败的解决方案

11月 01 00:03:26 master1 kubelet[14776]: e1101 00:03:26.115187 14776 control

54753ead12b597ac84a8096081268194.gif

在上次处理kubelet.go node "master" not found问题之后的一段时间里面,我又遇到了相同的问题发生在其他节点。它的表现方式是/etc/kubernetes/bootstrap-kubelet.conf: no such file or directory

我此前也写了一篇文章处理 k8s kubelet.go node "master" not found 问题[1]

假如按照此前的方式删除/etc/kubernetes/bootstrap-kubelet.conf之后可能就会出现kubelet.go node "master" not found的问题,随后使用 admin.conf 来替换启动文件来解决这个问题的。

但是我随后发现,这个问题的缘由是 kubelet 的证数到期后进行了证数更新导致的上面的这个错误,从而误导了我删除了10-kubeadm.conf种的--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf字段后重启,并使用将masteradmin.conf替换成kubelet.conf来解决了这个问题,这一个操作,似乎掩盖了真正的问题所在。

究其原因是因为 Kubelet 的证数没有更新。这种情况发生在手动执行了更新证数到期时间后导致的,kubeadm 更新证数并不会更新到 Kubelet 的证数(实际上是客户端证书轮换失败)。

于是当 kublet 被重启后,就发生了证数不一致的问题,此前将 master 的 admin.conf 替换成 kubelet.conf 来解决了这个问题的假象在于没有重启 kubelet。

  • 我个人并没有这种腿癖好,下午太困,群友说美腿提神啊(来自网图),响应号召

我们来看相同的报错,发生在 1.16 的 kubernetes 版本中:

  1. 2月 09 16:41:11 master systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
  2. 2月 09 16:41:11 master systemd[1]: Unit kubelet.service entered failed state.
  3. 2月 09 16:41:11 master systemd[1]: kubelet.service failed.
  4. 2月 09 16:41:22 master systemd[1]: kubelet.service holdoff time over, scheduling restart.
  5. 2月 09 16:41:22 master systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
  6. 2月 09 16:41:22 master systemd[1]: Started kubelet: The Kubernetes Node Agent.
  7. 2月 09 16:41:22 master kubelet[74138]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
  8. 2月 09 16:41:22 master kubelet[74138]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
  9. 2月 09 16:41:22 master kubelet[74138]: I0209 16:41:22.222741   74138 server.go:410] Version: v1.16.3
  10. 2月 09 16:41:22 master kubelet[74138]: I0209 16:41:22.223911   74138 plugins.go:100] No cloud provider specified.
  11. 2月 09 16:41:22 master kubelet[74138]: I0209 16:41:22.223954   74138 server.go:773] Client rotation is on, will bootstrap in background
  12. 2月 09 16:41:22 master systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
  13. 2月 09 16:41:22 master kubelet[74138]: E0209 16:41:22.227202   74138 bootstrap.go:265] part of the existing bootstrap client certificate is expired: 2021-03-18 08:46:29 +0000 UTC
  14. 2月 09 16:41:22 master kubelet[74138]: F0209 16:41:22.227239   74138 server.go:271] failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory
  15. 2月 09 16:41:22 master systemd[1]: Unit kubelet.service entered failed state.
  16. 2月 09 16:41:22 master systemd[1]: kubelet.service failed.

此前的方式就是直接删除了/etc/kubernetes/bootstrap-kubelet.conf(kubeadm 安装)这段,这段位于 kubelet 启动的的配置文件内,你可以通过命令来查看 贴图的日期不重要,仅仅提供说明

  1. [root@master ~]# systemctl status kubelet
  2. ● kubelet.service - kubelet: The Kubernetes Node Agent
  3.    Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  4.   Drop-In: /usr/lib/systemd/system/kubelet.service.d
  5.            └─10-kubeadm.conf
  6.    Active: active (running) since Thu 2021-12-30 03:08:09 CST; 1 months 23 days ago
  7.      Docs: https://kubernetes.io/docs/
  8.  Main PID: 32478 (kubelet)
  9.     Tasks: 29
  10.    Memory: 106.9M
  11.    CGroup: /system.slice/kubelet.service
  12.            └─32478 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/confi...

证数查看

首先我们查看证书

  1. [root@master pki]#  kubeadm alpha certs check-expiration
  2. CERTIFICATE                EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
  3. admin.conf                 Feb 072032 08:31 UTC   9y              no
  4. apiserver                  Feb 072032 08:31 UTC   9y              no
  5. apiserver-etcd-client      Feb 072032 08:31 UTC   9y              no
  6. apiserver-kubelet-client   Feb 072032 08:31 UTC   9y              no
  7. controller-manager.conf    Feb 072032 08:31 UTC   9y              no
  8. etcd-healthcheck-client    Feb 072032 08:31 UTC   9y              no
  9. etcd-peer                  Feb 072032 08:31 UTC   9y              no
  10. etcd-server                Feb 072032 08:31 UTC   9y              no
  11. front-proxy-client         Feb 072032 08:31 UTC   9y              no
  12. scheduler.conf             Feb 072032 08:31 UTC   9y              no

查看到的日期是正常的 而后我们查看 kubelet 的证数,kubelet.conf 是在/var/lib/kubelet/pki 的连接文件,于是我们查看它的证数到期时间

  1. [root@master ]# cd /var/lib/kubelet/pki
  2. [root@master pki]# ls
  3. kubelet-client-2020-03-18-16-46-37.pem  kubelet-client-2021-01-28-09-11-35.pem  kubelet-client-current.pem  kubelet.key
  4. kubelet-client-2020-03-18-16-47-03.pem  kubelet-client-2022-02-09-16-22-05.pem  kubelet.crt
  5. [root@master pki]#  openssl x509 -noout -enddate -in ./kubelet.crt
  6. notAfter=Mar 18 07:46:26 2021 GMT

我们可以看到在Mar 18 07:46:26 2021 GMT也就是说在 2021 年 3 月 18 日 07:46:26 就已经到期了

  • kubelet-client-2022-02-09-16-22-05.pem 文件是通过kubeadm alpha certs renew all更新后的,可以看到有不同的日期。这个 kubeadm 是有 10 年的时间的,所以它并不影响。但是这个 pem 和我们的日期也是对不上的

kubelet client 的日志也没更新

Kubelet 客户端证书轮换失败

来源于 kublet 的文章Kubelet 客户端证书轮换失败[2]原文如下:

By default, kubeadm configures a kubelet with automatic rotation of client certificates by using the /var/lib/kubelet/pki/kubelet-client-current.pem symlink specified in /etc/kubernetes/kubelet.conf. If this rotation process fails you might see errors such as x509: certificate has expired or is not yet valid in kube-apiserver logs. To fix the issue you must follow these steps:

  1. Backup and delete /etc/kubernetes/kubelet.conf and /var/lib/kubelet/pki/kubelet-client* from the failed node.

  2. From a working control plane node in the cluster that has /etc/kubernetes/pki/ca.key execute kubeadm kubeconfig user --org system:nodes --client-name system:node:$NODE > kubelet.conf. $NODE must be set to the name of the existing failed node in the cluster. Modify the resulted kubelet.conf manually to adjust the cluster name and server endpoint, or pass kubeconfig user --config (it accepts InitConfiguration). If your cluster does not have the ca.key you must sign the embedded certificates in the kubelet.conf externally.

  3. Copy this resulted kubelet.conf to /etc/kubernetes/kubelet.conf on the failed node.

  4. Restart the kubelet (systemctl restart kubelet) on the failed node and wait for /var/lib/kubelet/pki/kubelet-client-current.pem to be recreated.

  5. Manually edit the kubelet.conf to point to the rotated kubelet client certificates, by replacing client-certificate-data and client-key-data with:

    1. client-certificate: /var/lib/kubelet/pki/kubelet-client-current.pem
    2. client-key: /var/lib/kubelet/pki/kubelet-client-current.pem
  1. Restart the kubelet.

  2. Make sure the node becomes Ready.

翻译过来的意思如下:

默认情况下,kubeadm 通过使用/var/lib/kubelet/pki/kubelet-client-current.pem/etc/kubernetes/kubelet.conf. 如果此轮换过程失败,您可能会x509: certificate has expired or is not yet valid 在 kube-apiserver 日志中看到错误。要解决此问题,您必须执行以下步骤:

  1. 从故障节点备份/etc/kubernetes/kubelet.conf和删除。/var/lib/kubelet/pki/kubelet-client*

  2. 从集群中具有/etc/kubernetes/pki/ca.key执行 的工作控制平面节点kubeadm kubeconfig user --org system:nodes --client-name system:node:$NODE > kubelet.conf$NODE必须设置为集群中现有故障节点的名称。手动修改结果kubelet.conf以调整集群名称和服务器端点,或通过kubeconfig user --config(它接受InitConfiguration)。如果您的集群没有,您必须在外部ca.key签署嵌入式证书。kubelet.conf

  3. 将此结果复制kubelet.conf/etc/kubernetes/kubelet.conf故障节点上。

  4. 重新启动故障节点上的 kubelet ( systemctl restart kubelet) 并等待 /var/lib/kubelet/pki/kubelet-client-current.pem重新创建。

  5. 手动编辑kubelet.conf以指向旋转的 kubelet 客户端证书,方法是将 client-certificate-data和替换client-key-data为:

    1. client-certificate: /var/lib/kubelet/pki/kubelet-client-current.pem
    2. client-key: /var/lib/kubelet/pki/kubelet-client-current.pem
  1. 重启 kubelet。

  2. 确保节点变为Ready.

在 github 上有好几种办法,然而这种方式,被一些大佬吐槽,评价是过于粗糙

解决方法是复制/etc/kubernetes/admin.conf 特定键的内容 client-certificate-data 并将 client-key-data 这些新字符串粘贴到/etc/kubernetes/kubelet.conf 相同键下的文件中。然后只是一个service kubelet restart

  1. [root@master kubernetes]# cat admin.conf
  2. apiVersion: v1
  3. clusters:
  4. - cluster:
  5.     certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUN5RENDQWJDZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRJd01ETXhPREE0TkRZeU4xb1hEVE13TURNeE5qQTRORFl5TjFvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBTGFaClRWNODRKWVBPM09yKzdVbS9KN29sRVFEa3RGT3RWWHg0NWhQU0MrVkhWVEZib1JvOWEKNnVHT05iTWNHWVJjcERBbUZSU2pycnFlaFhmbTNjVWJaRUxrdmpTNXFsaFVONGlYak9idFFVYnQ4cHREYU9QSgo1cDUybjRnczdKMU92bzhKRjYzYU83Vy91cHdJS05MOEovWlpUVTh0YlU1TklkUzZCMXE1cFRSQTFBVT0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
  6.     server: https://master:6443
  7.   name: kubernetes
  8. contexts:
  9. - context:
  10.     cluster: kubernetes
  11.     user: kubernetes-admin
  12.   name: kubernetes-admin@kubernetes
  13. current-context: kubernetes-admin@kubernetes
  14. kind: Config
  15. preferences: {}
  16. users:
  17. - name: kubernetes-admin
  18.   user:
  19.     client-certificate-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUM4akNDQWRxZ0F3SUJBZ0lJUm91STNYU1ZTak13RFFZSktvWklodmNOQVFFTEJRQXdGVEVUTUJFR0ExVUUKQXhNS2EzVmlaWEp1WlhSbGN6QWVGdzB5TURBek1UZ3dPRFEyTWpkYUZ3MHpNakF5TURjd09ETXhNVGxhTURReApGekFWQmdOVkJBb1REb9FUmJWenpRQndxZ1djMkMrbmVmRlNYK0FQMHdrL2VmdXJpdGRqUTAKeFhVNjgwNnF0b1hzM3VHaWtNQkc1WmQzT2srLzc5NlZGM29TZllObU5CaVAxY3FjVUJIcVFpOTdQNVZSL2RmawpaR0phMVJoNE5aRk9IaXVqRXFFOGQxUFVLOTg0SHNxOTcxN0dIelRaZGNDMW1EcFF3d3FUdktVRlZOa3hQdFljCjdDWkl1QUltZWFwcXlQVkFhdEp5Vk5kVy9NRlVya0ZjTHZFMnlRQ1pXd1NxL3RnSDFtMD0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
  20.     client-key-data: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFcEFJQkFBS0NBUUVBcXVmRUo2NG9wR2txM1Vzd21SNGFiOTRuS0RjTTFMSWRsYnBXVkIraDAzZGp5K0ZICnJsRVVSVUdESEtBZjIvN2EwbTNrS0xoSWVudC9GRVRxSm5Kd3RUUzdmUDlDVzVwUGR2OHdEQ3o3U1dzK1ZrczcKTTVjcXhMNFovem5ySU9LZ2FmQzIyaTVFdjgrRjBqdW85b1lES3VwMFQ0bmxON3dNeXdjN1dFS0dNcGtEZGNnTgpwem1kTGZDSzQvNXdWeFhVcDFvTDJ1OHowV0RLKzcyN3plaFVMcFpZN0lXRG1PRnd2YzFxcmp6RFBCYWNxd3MwCnJyMkx6RXllRWt6cUZpd3BkcXBmbE4rYkxTZkN3ekNlWFdTcEVQ5UEVnV0dEWFlaYUhGTzBRZVF0a2Vnd2xoeWdXeXNZOTBBZnArbQpOeVByZW8zRngzaTlBUG9QeWRuNHFtbVd2dmhiT2FhUGZyK1pBUmFOa0JCaXc1OUw3eW5IMVhLcExMMDBGZHlCClFRYS8rUUtCZ1FDYzFLaXV3Ui9ZWGY5aGtKeWVZRTZHUXhKeEc2OWl2MDNuZm1ldi9zeExKZDY3WmxBemRrbDgKc3Vtb29uK0dhc0V4SGFqQUhkVVlNZmplU2ZxUkNOR1FISWM4cGFNYjQxbFErRGowRlBydzRHeThjcTBNWEtleQpIelduazQrVmpXeW9URVJoTnpkSEVUdXFKUG51TFdqbFhSaFhLWCtIVmVZVUdwN3pRNHFXQWc9PQotLS0tLUVORCBSU0EgUFJJVkFURSBLRVktLS0tLQo=

修改后如下

  1. [root@master kubernetes]# cat kubelet.conf
  2. apiVersion: v1
  3. clusters:
  4. - cluster:
  5.     certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUN5RENDQWJDZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRJd01ETXhPREE0TkRZeU4xb1hEVE13TURNeE5qQTRORFl5TjFvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBTGFaClRWNODRKWVBPM09yKzdVbS9KN29sRVFEa3RGT3RWWHg0NWhQU0MrVkhWVEZib1JvOWEKNnVHT05iTWNHWVJjcERBbUZSU2pycnFlaFhmbTNjVWJaRUxrdmpTNXFsaFVONGlYak9idFFVYnQ4cHREYU9QSgo1cDUybjRnczdKMU92bzhKRjYzYU83Vy91cHdJS05MOEovWlpUVTh0YlU1TklkUzZCMXE1cFRSQTFBVT0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
  6.     server: https://master:6443
  7.   name: kubernetes
  8. contexts:
  9. - context:
  10.     cluster: kubernetes
  11.     user: system:node:master
  12.   name: system:node:master@kubernetes
  13. current-context: system:node:master@kubernetes
  14. kind: Config
  15. preferences: {}
  16. users:
  17. - name: system:node:master
  18.   user:
  19.     client-certificate-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUM4akNDQWRxZ0F3SUJBZ0lJUm91STNYU1ZTak13RFFZSktvWklodmNOQVFFTEJRQXdGVEVUTUJFR0ExVUUKQXhNS2EzVmlaWEp1WlhSbGN6QWVGdzB5TURBek1UZ3dPRFEyTWpkYUZ3MHpNakF5TURjd09ETXhNVGxhTURReApGekFWQmdOVkJBb1REb9FUmJWenpRQndxZ1djMkMrbmVmRlNYK0FQMHdrL2VmdXJpdGRqUTAKeFhVNjgwNnF0b1hzM3VHaWtNQkc1WmQzT2srLzc5NlZGM29TZllObU5CaVAxY3FjVUJIcVFpOTdQNVZSL2RmawpaR0phMVJoNE5aRk9IaXVqRXFFOGQxUFVLOTg0SHNxOTcxN0dIelRaZGNDMW1EcFF3d3FUdktVRlZOa3hQdFljCjdDWkl1QUltZWFwcXlQVkFhdEp5Vk5kVy9NRlVya0ZjTHZFMnlRQ1pXd1NxL3RnSDFtMD0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
  20.     client-key-data: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFcEFJQkFBS0NBUUVBcXVmRUo2NG9wR2txM1Vzd21SNGFiOTRuS0RjTTFMSWRsYnBXVkIraDAzZGp5K0ZICnJsRVVSVUdESEtBZjIvN2EwbTNrS0xoSWVudC9GRVRxSm5Kd3RUUzdmUDlDVzVwUGR2OHdEQ3o3U1dzK1ZrczcKTTVjcXhMNFovem5ySU9LZ2FmQzIyaTVFdjgrRjBqdW85b1lES3VwMFQ0bmxON3dNeXdjN1dFS0dNcGtEZGNnTgpwem1kTGZDSzQvNXdWeFhVcDFvTDJ1OHowV0RLKzcyN3plaFVMcFpZN0lXRG1PRnd2YzFxcmp6RFBCYWNxd3MwCnJyMkx6RXllRWt6cUZpd3BkcXBmbE4rYkxTZkN3ekNlWFdTcEVQ5UEVnV0dEWFlaYUhGTzBRZVF0a2Vnd2xoeWdXeXNZOTBBZnArbQpOeVByZW8zRngzaTlBUG9QeWRuNHFtbVd2dmhiT2FhUGZyK1pBUmFOa0JCaXc1OUw3eW5IMVhLcExMMDBGZHlCClFRYS8rUUtCZ1FDYzFLaXV3Ui9ZWGY5aGtKeWVZRTZHUXhKeEc2OWl2MDNuZm1ldi9zeExKZDY3WmxBemRrbDgKc3Vtb29uK0dhc0V4SGFqQUhkVVlNZmplU2ZxUkNOR1FISWM4cGFNYjQxbFErRGowRlBydzRHeThjcTBNWEtleQpIelduazQrVmpXeW9URVJoTnpkSEVUdXFKUG51TFdqbFhSaFhLWCtIVmVZVUdwN3pRNHFXQWc9PQotLS0tLUVORCBSU0EgUFJJVkFURSBLRVktLS0tLQo=

最后我们得到的结果是,通过kubeadm alpha certs renew all更新的 k8s 证数,是不会更新 kubelet.conf 的证数的,并且这在 github 上得到了进一步的讨论和证实

参考

  • Kubelet can't running after renew certificates[3]

  • 处理 k8s kubelet.go node "master" not found 问题[4]

引用链接

[1]

处理 k8s kubelet.go node "master" not found 问题: https://www.linuxea.com/2580.html

[2]

Kubelet 客户端证书轮换失败: https://www.linuxea.com/https

[3]

Kubelet can't running after renew certificates: https://github.com/kubernetes/kubeadm/issues/2054

[4]

处理 k8s kubelet.go node "master" not found 问题: https://www.linuxea.com/2580.html

原文链接:https://www.linuxea.com/2626.html

ddcc57173cba91b8f1247964a76982c5.gif

3ee85331393b677dce125800b032552d.png

你可能还喜欢

点击下方图片即可阅读

afb04f855fcaf35610764518d662381f.png

芜湖,Tailscale 开源版本让你的 WireGuard 直接起飞~

b2c4a2abb8254335dd6a244ce94175d2.gif

云原生是一种信仰 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小蓝xlanll/article/detail/189883
推荐阅读
相关标签