linux 集群中探针
重点 (Top highlight)
It’s hard to design a robust K8s cluster with multiple inter-dependent services. Often if one of the core-service crashes, all services depending on it would fail…which is expected, but we should be able to pinpoint the service causing failures and restart/rollback without any manual efforts. K8s Liveness, Readiness probes and minReadySeconds
setting come to rescue.
设计具有多个相互依赖的服务的健壮的K8s集群非常困难。 通常,如果其中一项核心服务崩溃了,所有依赖它的服务都会失败……这是预料之中的,但是我们应该能够找出导致失败的服务并重新启动/回滚,而无需任何人工操作。 K8s的Liveness , Readiness探针和minReadySeconds
设置可以解决。
Let’s go through these settings one by one and see why, how, and in which combination to use them:
让我们逐一浏览这些设置,看看为什么,如何以及以哪种组合使用它们:
K8s活力探头 (K8s Liveness Probes)
The kubelet
uses liveness probes to know when to restart a container. For example, liveness probes could catch a deadlock, where an application is running, but unable to make progress. Restarting a container in such a state can help to make the application more available despite the bug. Many applications running for long periods of time eventually transition to broken states, and cannot recover except by being restarted. Kubernetes provides liveness probes to detect and remedy such situations.
kubelet
使用活动性探针来了解何时重新启动容器 。 例如,活动性探针可能会陷入僵局,而应用程序正在运行,但无法取得进展。 在这种状态下重新启动容器可以帮助使该应用程序更有效,尽管存在该错误。 许多长时间运行的应用程序最终会转换为损坏的状态,除非重新启动,否则无法恢复。 Kubernetes提供了活动性探针来检测和纠正这种情况。
Liveness can be checked as:
活度可检查为:
liveness command (
exec
)活动命令(
exec
)
livenessProbe: exec: command: - cat - /tmp/healthy initialDelaySeconds: 5 periodSeconds: 5
liveness HTTP GET request (
httpGet
)httpGet
HTTP GET请求(httpGet
)
livenessProbe: httpGet: path: /healthz port: 8080 httpHeaders: - name: Custom-Header value: Awesome initialDelaySeconds: 3 periodSeconds: 3
liveness TCP probe (
tcpSocket
)活跃度TCP探针(
tcpSocket
)
readinessProbe: tcpSocket: port: 8080 initialDelaySeconds: 5 periodSeconds: 10 livenessProbe: tcpSocket: port: 8080 initialDelaySeconds: 15 periodSeconds: 20
Restart policy:
重新启动政策:
Note: Restarting a container (it refreshes everything on the container) in a Pod should not be confused with restarting a Pod. A Pod is not a process, but an environment for running a container. A Pod persists until it is deleted.
注意:在Pod中重新启动容器(刷新容器上的所有内容)不应与重新启动Pod混淆。 Pod不是进程,而是用于运行容器的环境。 Pod一直存在直到被删除。
A PodSpec has a restartPolicy
field with possible values Always
, OnFailure
, and Never
. The default value is Always
. restartPolicy
applies to all Containers in the Pod. restartPolicy
only refers to restarts of the Containers by the kubelet
on the same node. Exited Containers that are restarted by the kubelet
are restarted with an exponential back-off delay (10s, 20s, 40s …) capped at five minutes, and is reset after ten minutes of successful execution. As discussed in the Pods document, once bound to a node, a Pod will never be rebound to another node.
PodSpec具有restartPolicy
字段,其可能值为Always
, OnFailure
和Never
。 默认值为Always
。 restartPolicy
适用于Pod中的所有容器。 restartPolicy
仅指通过kubelet
在同一节点上重新启动容器。 通过kubelet
重新启动的退出容器kubelet
指数退避延迟(10s,20s,40s…)重新启动,上限为五分钟,并在成功执行十分钟后重置。 正如Pods文档中所讨论的,一旦绑定到一个节点,一个Pod将永远不会反弹到另一个节点。
在K8s中配置探针 (Configure Probes in K8s)
Probes have a number of fields that you can use to more precisely control the behavior of liveness and readiness checks:
探针具有许多字段,可用于更精确地控制活动性和准备情况检查的行为:
initialDelaySeconds
: Number of seconds after the container has started before liveness or readiness probes are initiated. Defaults to0
seconds. Minimum value is0
.initialDelaySeconds
:启动容器后,启动活动性或就绪性探针之前的秒数。 默认为0
秒。 最小值为0
。periodSeconds
: How often (in seconds) to perform the probe. Default to10
seconds. Minimum value is1
.periodSeconds
:执行探测的频率(以秒为单位)。 默认为10
秒。 最小值为1
。timeoutSeconds
: Number of seconds after which the probe times out. Defaults to1
second. Minimum value is1
.timeoutSeconds
:探针超时的秒数。 默认为1
秒。 最小值为1
。successThreshold
: Minimum consecutive successes for the probe to be considered successful after having failed. Defaults to1
. Must be1
for liveness. Minimum value is1
.successThreshold
:探针失败后最少连续成功的次数。 默认为1
。 为了保持活力,必须为1
。 最小值为1
。failureThreshold
: When a probe fails, Kubernetes will try failureThreshold times before giving up. Giving up in case of liveness probe means restarting the container. In case of readiness probe the Pod will be marked Unready. Defaults to3
. Minimum value is1
.failureThreshold
:探测失败时,Kubernetes将在尝试放弃之前尝试使用failureThreshold时间。 放弃活动探针意味着重新启动容器。 如果准备就绪,则将Pod标记为“未就绪”。 默认为3
。 最小值为1
。
HTTP probes have additional fields that can be set on httpGet
:
HTTP探针具有可以在httpGet
上设置的其他字段:
host
: Host name to connect to, defaults to the pod IP. You probably want to set “Host” in httpHeaders instead.host
:要连接的主机名,默认为Pod IP。 您可能要改为在httpHeaders中设置“主机”。scheme
: Scheme to use for connecting to the host (HTTP or HTTPS). Defaults to HTTP.scheme
:用于连接到主机(HTTP或HTTPS)的方案。 默认为HTTP。path
: Path to access on the HTTP server.path
:HTTP服务器上的访问路径。httpHeaders
: Custom headers to set in the request. HTTP allows repeated headers.httpHeaders
:要在请求中设置的自定义标题。 HTTP允许重复的标头。port
: Name or number of the port to access on the container. Number must be in the range1
to65535
.port
:容器上要访问的端口的名称或port
号。 数字必须在1
到65535
。
For a TCP probe, the kubelet
makes the probe connection at the node, not in the pod, which means that you can not use a service name in the host
parameter since the kubelet
is unable to resolve it.
对于TCP探针, kubelet
在节点而不是在pod中进行探针连接,这意味着您无法在host
参数中使用服务名称,因为kubelet
无法解析它。
K8s准备就绪探针 (K8s Readiness Probes)
The kubelet
uses readiness probes to know when a container is ready to start accepting traffic. A Pod is considered ready when all of its containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from the Service load balancers.
kubelet
使用就绪性探测器来了解何时容器准备开始接受流量 。 当Pod的所有容器都准备就绪时,即视为准备就绪。 该信号的一种用法是控制将哪些Pod用作服务的后端。 当Pod尚未就绪时,会将其从服务负载平衡器中删除。
Sometimes, applications are temporarily unable to serve traffic. For example, an application might need to load large data or configuration files during startup or depend on external services after startup. In such cases, you don’t want to kill the application, but you don’t want to send it requests either.
有时,应用程序暂时无法为流量提供服务。 例如,应用程序可能需要在启动过程中加载大数据或配置文件,或者在启动后依赖于外部服务。 在这种情况下,您不想杀死该应用程序,但也不想发送它的请求。
Readiness probes are configured similarly to liveness. The only difference is that you use the readinessProbe
field instead of the livenessProbe
field.
准备探针的配置与活动类似。 唯一的区别是,您使用readinessProbe
字段而不是livenessProbe
字段。
Readiness and liveness probes can be used in parallel for the same container. Using both can ensure that traffic does not reach a container that is not ready for it, and the containers are restarted when they fail.
准备就绪和活动性探针可以并行用于同一容器。 同时使用这两者可以确保流量不会到达尚未准备就绪的容器,并且在容器出现故障时会重新启动。
K8s启动探针 (K8s Startup Probes)
The kubelet
uses startup probes to know when a container application has started. If such a probe is configured, it disables liveness and readiness checks until it succeeds, making sure those probes don’t interfere with the application startup. This can be used to adopt liveness checks on slow starting containers, avoiding them getting killed by the kubelet
before they are up and running.
kubelet
使用启动探针来了解容器应用程序何时启动 。 如果配置了这样的探针,它将禁用活动性和就绪性检查,直到成功为止,确保这些探针不会干扰应用程序的启动。 这可用于对启动缓慢的容器进行活动检查,避免它们在启动和运行之前被kubelet
杀死。
用于Pod的minReadySeconds (The minReadySeconds for Pod)
The .spec.minReadySeconds
is an optional field that specifies the minimum number of seconds for which a newly created Pod should be read without any of its containers crashing, for it to be considered available. This defaults to 0
(the Pod will be considered available as soon as it is ready).
.spec.minReadySeconds
是一个可选字段,用于指定新创建的Pod读取的最短秒数,而该容器的任何容器均不会崩溃,该时间才视为可用。 默认为0
(准备就绪后,该Pod将被视为可用)。
Once a Pod’s container is started, it would not be considered to be ready (for accepting traffic) if it passes readinessProbe
(if it has). Then, at the moment all of the Pod’s containers are ready, the Pod is considered to be ready. But, if .spec.minReadySeconds
setting is defined for the Pod, then Pod would not still be considered ready until it passes seconds defined by .spec.minReadySeconds
without any container crashing.
启动Pod的容器后,如果容器通过,则将不被视为已准备就绪(接受流量) readinessProbe
(如果有)。 然后,在Pod的所有容器准备就绪的那一刻,该Pod被视为已就绪。 但是,如果 .spec.minReadySeconds
设置已为Pod定义,那么Pod直到经过由定义的秒数时,仍不会视为就绪 .spec.minReadySeconds
没有任何容器崩溃。
Here are some related interesting stories that you might find helpful:
以下是一些相关的有趣故事,您可能会觉得有帮助:
翻译自: https://levelup.gitconnected.com/why-and-how-to-set-probes-in-kubernetes-d7da39e94e64
linux 集群中探针