当前位置:   article > 正文

CNCF 调度学习笔记_insufficient_quota

insufficient_quota

1.1. 分工

1.1.1. kube-apiserver

1.1.1.1. 接收kubelet,发送(webhooks)给controller

1.1.1.2. 接收controller 结果,保存到etcd

1.1.2. controller

1.1.2.1. validate/Admit

1.1.3. Scheduler

1.1.3.1. 寻找合适node 并进行bind

1.1.4. kubelet

1.1.4.1. 创建-驱逐POD


1.2. 基础能力

1.2.1. 资源调度

1.2.1.1. Resources:CPU/Memory/Storage(含临时存储)/GPU/FGPA

1.2.1.1.1. 扩展资源:GPU/FGPA

  1.2.1.1.1.1. 只能设置整数
  • 1
  • 2
  • 3

1.2.1.2. Qos

1.2.1.2.1. POD 资源申请导致的分类:Guaranteed/Burstable/BestEffort
  • 1

1.2.1.3. Resource Quota

1.2.1.3.1. Namespace 总量,可以基于Qos分类限制
  • 1

1.2.2. 关系调度

1.2.2.1. PodAffinity 及PodAntiAffinity

1.2.2.2. NodeSelector/NodeAffinity

1.2.2.3. Taint/Tolerations


1.3. Resource Quota

1.3.1. It is enabled when the apiserver --enable-admission-plugins= flag has ResourceQuota

1.3.2. CPU和存储可以按类型(GPU/存储class)来限制

1.3.3. CPU/存储 quota类型

1.3.3.1. limits.cpu Across all pods in a non-terminal state, the sum of CPU limits cannot exceed this value.

1.3.3.2. limits.memory Across all pods in a non-terminal state, the sum of memory limits cannot exceed this value.

1.3.3.3. requests.cpu Across all pods in a non-terminal state, the sum of CPU requests cannot exceed this value.

1.3.3.4. requests.memory Across all pods in a non-terminal state, the sum of memory requests cannot exceed this value.

1.3.3.5. requests.nvidia.com/gpu: 4

1.3.3.6. requests.storage Across all persistent volume claims, the sum of storage requests cannot exceed this value.

1.3.3.7. persistentvolumeclaims The total number of persistent volume claims that can exist in the namespace.

1.3.3.8. .storageclass.storage.k8s.io/requests.storage 指定stroageclass的限制

1.3.3.9. .storageclass.storage.k8s.io/persistentvolumeclaims 指定stroageclass的限制

1.3.3.10. requests.ephemeral-storage Across all pods in the namespace, the sum of local ephemeral storage requests cannot exceed this value.

1.3.3.11. limits.ephemeral-storage Across all pods in the namespace, the sum of local ephemeral storage limits cannot exceed this value.

1.3.4. Object Count Quota

1.3.4.1. 两种模式:count/type 或者指定类型

1.3.4.1.1. count/<resource>.<group>

1.3.4.1.2. count/* resource quota, an object is charged against the quota if it exists in server storage
  • 1
  • 2
  • 3

1.3.4.2. 指定类型

1.3.4.2.1. configmaps	The total number of config maps that can exist in the namespace.

1.3.4.2.2. persistentvolumeclaims	The total number of persistent volume claims that can exist in the namespace.

1.3.4.2.3. pods	The total number of pods in a non-terminal state that can exist in the namespace. A pod is in a terminal state if .status.phase in (Failed, Succeeded) is true.

1.3.4.2.4. replicationcontrollers	The total number of replication controllers that can exist in the namespace.

1.3.4.2.5. resourcequotas	The total number of resource quotas that can exist in the namespace.

1.3.4.2.6. services	The total number of services that can exist in the namespace.

1.3.4.2.7. services.loadbalancers	The total number of services of type load balancer that can exist in the namespace.

1.3.4.2.8. services.nodeports	The total number of services of type node port that can exist in the namespace.

1.3.4.2.9. secrets	The total number of secrets that can exist in the namespace.
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17

1.3.5. Quota Scopes

1.3.5.1. 限定quota使用的范围

1.3.5.2. Terminating Match pods where .spec.activeDeadlineSeconds >= 0

1.3.5.3. NotTerminating Match pods where .spec.activeDeadlineSeconds is nil

1.3.5.4. BestEffort Match pods that have best effort quality of service.

1.3.5.5. NotBestEffort Match pods that do not have best effort quality of service.

1.3.5.6. Terminating, NotTerminating, and NotBestEffort scopes只能限制

1.3.5.6.1. cpu,limits.cpu,limits.memory,memory,pods,requests.cpu,requests.memory
  • 1

1.3.6. beta模式,将POD归类后限制

1.3.6.1. Resource Quota Per PriorityClass

1.3.6.2. using the scopeSelector field in the quota spec

1.3.6.3. scopeSelector supports the following values in the operator field:In,NotIn,Exist,DoesNotExist

1.3.7. kubectl apply -f https://k8s.io/examples/admin/resource/quota-pod.yaml --namespace=quota-pod-example

1.3.8. 调度及资源绑定

1.3.8.1. 调度器使用request进行调度?limit不用吗?

1.3.8.2. CPU按request进行划分权重

1.3.8.2.1. 设置kubelet 参数 --cpu-manager-policy=static,guaranteed会绑定核(当CPU需求设置为整数)

1.3.8.2.2. https://kubernetes.io/zh/docs/tasks/administer-cluster/cpu-management-policies/
  • 1
  • 2
  • 3

1.3.8.3. Memory按QOCS划分OOMScore

1.3.8.3.1. Guaranteed

  1.3.8.3.1.1. -998

1.3.8.3.2. Burstable

  1.3.8.3.2.1. 2-999

1.3.8.3.3. Burstable

  1.3.8.3.3.1. 1000

1.3.8.3.4. Linux 内核有个机制叫OOM killer(Out Of Memory killer),该机制会监控那些占用内存过大,尤其是瞬间占用内存很快的进程,然后防止内存耗尽而自动把该进程杀掉。
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13

1.4. 使用LimitRange限制单个pod的内存和cpu

1.4.1. 上下限

1.4.2. 默认值


1.5. Affinity and anti-affinity

1.5.1. Node affinity

1.5.1.1. requiredDuringSchedulingIgnoredDuringExecution

1.5.1.2. preferredDuringSchedulingIgnoredDuringExecution

1.5.1.2.1. weight field

  1.5.1.2.1.1.  range 1-100

  1.5.1.2.1.2. 多个preferredDuringSchedulingIgnoredDuringExecution才有用?
  • 1
  • 2
  • 3
  • 4
  • 5

1.5.1.3. operators: In, NotIn, Exists, DoesNotExist, Gt, Lt

1.5.1.4. nodeSelectorTerms

1.5.1.4.1. ,则如果其中一个 nodeSelectorTerms 满足的话,pod将可以调度到节点上。
  • 1

1.5.1.5. matchExpressions

1.5.1.5.1. 只有当所有 matchExpressions 满足的话,pod 才会可以调度到节点上
  • 1

1.5.1.6. 同时指定了 nodeSelector 和 nodeAffinity,两者必须都要满足,才能将 pod 调度到候选节点上

1.5.2. 只在 pod 调度期间有效

1.5.3. podAffinity及podAffinity

1.5.3.1. requiredDuringSchedulingIgnoredDuringExecution

1.5.3.2. preferredDuringSchedulingIgnoredDuringExecution

1.5.3.3. matchExpressions

1.5.3.3.1. 只有当所有 matchExpressions 满足的话,pod 才会可以调度到节点上
  • 1

1.5.3.4. podAffinityTerm

1.5.3.4.1. ,则如果其中一个 podAffinityTerm满足的话,pod将可以调度到节点上。
  • 1

1.5.3.5. topologyKey:node没有

1.5.3.5.1. 指定在一组相同label内的node上进行调度,比如同一zone:failure-domain.beta.kubernetes.io/zone

1.5.3.5.2. affinity and for requiredDuringSchedulingIgnoredDuringExecution pod anti-affinity, 必须有topologyKey

1.5.3.5.3. requiredDuringSchedulingIgnoredDuringExecution antiAffinity及使用LimitPodHardAntiAffinityTopology controller时候:topologyKey 不能为 kubernetes.io/hostname

1.5.3.5.4. preferredDuringSchedulingIgnoredDuringExecution 要求的 pod 反亲和,空的 topologyKey

  1.5.3.5.4.1. 值为:kubernetes.io/hostname,failure-domain.beta.kubernetes.io/zone 和 failure-domain.beta.kubernetes.io/region

1.5.3.5.5. 除上述情况外,topologyKey 可以是任何合法的标签键

1.5.3.5.6. which is a prepopulated Kubernetes label that the system uses to denote such a topology domain
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13

1.5.3.6. 合法操作符有 In,NotIn,Exists,DoesNotExist

1.5.3.6.1. 比node少Gt,Lt
  • 1

1.6. 可以自定义scheduler

1.6.1. https://kubernetes.io/blog/2017/03/advanced-scheduling-in-kubernetes/

1.6.2. POD.spec.schedulerName :自定义scheduler

1.6.3. https://www.qikqiak.com/post/custom-kube-scheduler/

1.6.4. https://kubernetes.io/zh/docs/tasks/administer-cluster/configure-multiple-schedulers/


1.7. 指定node

1.7.1. spec.nodeName: kube-01


1.8. pod.spec.topologySpreadConstraints

1.8.1. v1.16 alpha

1.8.1.1. EvenPodsSpread 功能已开启(在 1.16 版本中该功能默认关闭)

1.8.1.2. EvenPodsSpread 必须在 API Server 和 scheduler 中都要开启

1.8.2. topologySpreadConstraints.spec

1.8.2.1. maxSkew 描述 pod 分布不均的程度。这是给定拓扑类型中任意两个拓扑域中匹配的 pod 之间的最大允许差值。它必须大于零。

1.8.2.2. topologyKey 是节点标签的键。如果两个节点使用此键标记并且具有相同的标签值,则调度器会将这两个节点视为处于同一拓扑中。调度器试图在每个拓扑域中放置数量均衡的 pod。

1.8.2.3. whenUnsatisfiable 指示如果 pod 不满足扩展约束时如何处理:

1.8.2.3.1. DoNotSchedule(默认)告诉调度器不用进行调度。

1.8.2.3.2. ScheduleAnyway 告诉调度器在对最小化倾斜的节点进行优先级排序时仍对其进行调度。
  • 1
  • 2
  • 3

1.8.2.4. labelSelector 用于查找匹配的 pod。匹配此标签的 pod 将被统计,以确定相应拓扑域中 pod 的数量

1.8.2.5. 可以有多组策略

1.8.2.5.1. kind: Pod
  • 1

apiVersion: v1
metadata:
name: mypod
labels:
foo: bar
spec:
topologySpreadConstraints:

  • maxSkew: 1
    topologyKey: zone
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
    matchLabels:
    foo: bar
  • maxSkew: 1
    topologyKey: node
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
    matchLabels:
    foo: bar
    containers:
  • name: pause
    image: k8s.gcr.io/pause:3.1

1.8.2.6. 可以和affinity组合使用

1.8.2.6.1. 对于 PodAffinity,可以尝试将任意数量的 pod 打包到符合条件的拓扑域中。
  • 1

对于 PodAntiAffinity,只能将一个 pod 调度到单个拓扑域中。

1.8.2.7. 1.16 issue

1.8.2.7.1. Deployment 的缩容可能导致 pod 分布不平衡。
  • 1

pod 匹配到污点节点是允许的


1.9. Pod Priority and Preemption

1.9.1. 使用步骤

1.9.1.1. Add one or more PriorityClasses.

1.9.1.2. Create Pods withpriorityClassName set to one of the added PriorityClasses.

1.9.2. 关闭抢占-preemption

1.9.2.1. 设置feature :NonPreemptingPriority is enabled,同时PriorityClasses set preemptionPolicy: Never

1.9.2.2. kube-scheduler flag disablePreemption设置为true

1.9.2.3. 使用KubeSchedulerConfiguration 对象,设置disablePreemption: true

1.9.3. PriorityClass

1.9.3.1. cannot be prefixed with system-.

1.9.3.2. non-namespaced object

1.9.3.3. spec

1.9.3.3.1. value

  1.9.3.3.1.1. 要不大于1B

1.9.3.3.2. globalDefault

  1.9.3.3.2.1. true does not change the priorities of existing Pods

1.9.3.3.3. description

1.9.3.3.4. preemptionPolicy

  1.9.3.3.4.1. 设置为Never将不进行抢占
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13

1.9.3.4. PriorityClass 被删除后,已经存在POD不受影响

1.9.3.5. 默认优先级别

1.9.3.5.1. DefaultPriorityWhenNoDefaultClassExists=0

1.9.3.5.2. HighestUserDefinablePriority=100000000(1B)

1.9.3.5.3. SystemCriticalPriority=2B

1.9.3.5.4. system-cluster-critical

1.9.3.5.5. system-node-critical
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

1.9.4.


1.10. Taint 和 Toleration

1.10.1. tolerations

1.10.1.1. spec

1.10.1.1.1. key

1.10.1.1.2. operator

1.10.1.1.3. value

1.10.1.1.4. tolerationSeconds

  1.10.1.1.4.1. 主要针对noexcute-运行最大时间

  1.10.1.1.4.2. Subtopic

1.10.1.1.5. effect

  1.10.1.1.5.1. NoExecute

    1.10.1.1.5.1.1. 已经运行的pod会按策略被驱逐

  1.10.1.1.5.2. NoSchedule
  1.10.1.1.5.3. PreferNoSchedule
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20

1.10.2. taint

1.10.2.1. 在节点上部署

1.10.2.2. kubectl taint nodes node1 key=value:NoSchedule

1.10.2.2.1. effect 是 NoSchedule
  • 1

1.10.2.3. effect类型

1.10.2.3.1. NoSchedule

1.10.2.3.2. NoExecute

  1.10.2.3.2.1. 已经运行的pod会按策略被驱逐

1.10.2.3.3. PreferNoSchedule
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

1.10.3. 用途:专用节点或者指定硬件节点的适配

1.10.4. 基于 taint 的驱逐 (beta 特性)

1.10.4.1. 如果 pod 不能忍受effect 值为 NoExecute 的 taint,那么 pod 将马上被驱逐

1.10.4.2. 如果 pod 能够忍受effect 值为 NoExecute 的 taint,但是在 toleration 定义中没有指定 tolerationSeconds,则 pod 还会一直在这个节点上运行。

1.10.4.3. 如果 pod 能够忍受effect 值为 NoExecute 的 taint,而且指定了 tolerationSeconds,则 pod 还能在这个节点上继续运行这个指定的时间长度

1.10.5. 当某种条件为真时,node controller会自动给节点添加一个 taint

1.10.5.1. built taints

1.10.5.1.1. node.kubernetes.io/not-ready: Node is not ready. This corresponds to the NodeCondition Ready being “False”.

1.10.5.1.2. node.kubernetes.io/unreachable: Node is unreachable from the node controller. This corresponds to the NodeCondition Ready being “Unknown”.

1.10.5.1.3. node.kubernetes.io/out-of-disk: Node becomes out of disk.

1.10.5.1.4. node.kubernetes.io/memory-pressure: Node has memory pressure.

1.10.5.1.5. node.kubernetes.io/disk-pressure: Node has disk pressure.

1.10.5.1.6. node.kubernetes.io/network-unavailable: Node’s network is unavailable.

1.10.5.1.7. node.kubernetes.io/unschedulable: Node is unschedulable.

1.10.5.1.8. node.cloudprovider.kubernetes.io/uninitialized:
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15

1.10.5.2. 1.13, the TaintBasedEvictions feature is promoted to beta and enabled by default

1.10.5.3. DaemonSet controller默认容忍以下taints

1.10.5.3.1. node.kubernetes.io/memory-pressure

1.10.5.3.2. node.kubernetes.io/disk-pressure

1.10.5.3.3. node.kubernetes.io/out-of-disk (only for critical pods)

1.10.5.3.4. node.kubernetes.io/unschedulable (1.10 or later)

1.10.5.3.5. node.kubernetes.io/network-unavailable (host network only)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

1.11. PDB

1.11.1. Pod Disruption Budget

1.11.2. kind: PodDisruptionBudget

1.11.2.1. PDB对象不能被更新,你只能够删除它然后重新创建.

1.11.3. spec

1.11.3.1. .spec.selector

1.11.3.1.1. matchLabels:
  • 1

1.11.3.2. .spec.minAvailable,在pod被驱离的情况下,必须保证的最小可用的pod数量.minAvailable可以是一个绝对数值或者一个百分数

1.11.3.3. spec.maxUnavailable

1.11.3.4. minAvailable and maxUnavailable cannot be both set

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/笔触狂放9/article/detail/358782?site
推荐阅读
相关标签
  

闽ICP备14008679号