赞
踩
写给自己的入门篇。后续会在原理方面持续更新
k8s 集群的创建有多种方法,可以按照官方文档的说明来操作 https://kubernetes.io/docs/setup/production-environment/tools/
对于新手(比如我)来说,我认为需要结合 k8s 架构来理解集群创建的过程。
(图片来源于 https://www.redhat.com/en/topics/containers/kubernetes-architecture)
需要安装:
初始化 control-plane (master) 节点:
# kubeadm init
这个过程一般会遇到很多报错,Google 是最好的寻求解决方案的地方。初始化成功后,将会有如下输出:
Your Kubernetes control-plane has initialized successfully! To start using your cluster, you need to run the following as a regular user: mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config Alternatively, if you are the root user, you can run: export KUBECONFIG=/etc/kubernetes/admin.conf You should now deploy a pod network to the cluster. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: https://kubernetes.io/docs/concepts/cluster-administration/addons/ Then you can join any number of worker nodes by running the following on each as root: kubeadm join 10.7.157.30:6443 --token 2rg4l1.n0rhvdp0uvxdrxjv \ --discovery-token-ca-cert-hash sha256:fd7d661ec35868d036761e844597807a3d076daf3c8b71de6e1b55ee01e66a32
此时会发现,如下 Pods 已创建,除了 coredns 处于 Pending 状态,其余都处于 Running 状态:
# export KUBECONFIG=/etc/kubernetes/admin.conf
# kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
node1 Ready control-plane 2m50s v1.24.0 10.7.157.30 <none> Red Hat Enterprise Linux Server 7.7 (Maipo) 3.10.0-1062.el7.x86_64 containerd://1.6.4
# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-6d4b75cb6d-752q4 0/1 Pending 0 35s
kube-system coredns-6d4b75cb6d-7h2g5 0/1 Pending 0 35s
kube-system etcd-node1 1/1 Running 5 47s
kube-system kube-apiserver-node1 1/1 Running 4 48s
kube-system kube-controller-manager-node1 1/1 Running 1 47s
kube-system kube-proxy-px447 1/1 Running 0 35s
kube-system kube-scheduler-node1 1/1 Running 4 48s
先说 k8s 的网络模型。
对于 k8s 网络,核心理念是 - 每个 pod 都有唯一的 IP。 Pod 中所有 container 共享该 IP,并可以与其他 Pod 通信。
通常会在 kubeadm.config.yaml 中设置 pod subnet 作为 CIDR 块,即一系列 IP 地址,在此范围内分配 IP 给 pod:
#### in kubeadm-config.yaml ####
kind: ClusterConfiguration
apiVersion: kubeadm.k8s.io/v1beta3
kubernetesVersion: v1.24.0
networking:
podSubnet: 10.244.0.0/16
Pod 之间的通信,通常会结合管道对与以太网桥来实现:
cni0 本质上是 Linux 网桥,可以发送 ARP request 与解析 ARP response
eno1 作为 node 之间通信的网络接口,启用了 IP 转发,可以依据 Route Table 将收到的数据包转发给 cni0
为了启用 k8s primary network,需要安装 primary network CNI
有多种选择,如 flannel, Calico, WeaveNet 等。 此例选取 flannel,需要设置 flannel 使用的网络接口:
# yum install -y flannel
# vi /etc/sysconfig/flanneld ## add additional options:
FLANNEL_OPTIONS="-iface=eno1"
# cp /usr/bin/flanneld /opt/bin
# kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml
Warning: policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created
此时 coredns 将会变为 Running 状态
Primary network 常用于 Pod 之间的基本通信。通常需要为 Pod 提供 secondary network,作为高性能网络供应用程序使用:
需要部署:
其中 Multus CNI 可以看作一个 meta plugin,与其他 CNI plugin 配合使用,以实现多网络接口的功能:
创建 configmap
# cat k8s-rdma-shared-dev-plugin-config-map.yaml apiVersion: v1 kind: ConfigMap metadata: name: rdma-devices namespace: kube-system data: config.json: | { "periodicUpdateInterval": 300, "configList": [{ "resourceName": "cx5_bond_shared_devices_a", "rdmaHcaMax": 1000, "selectors": { "vendors": ["15b3"], "deviceIDs": ["1017"] } }, { "resourceName": "cx6dx_shared_devices_b", "rdmaHcaMax": 500, "selectors": { "vendors": ["15b3"], "deviceIDs": ["101d"] } } ] } # kubectl create -f k8s-rdma-shared-dev-plugin-config-map.yaml configmap/rdma-devices created
创建 k8s-rdma-shared-dev-plugin daemonset
# kubectl create -f https://raw.githubusercontent.com/Mellanox/k8s-rdma-shared-dev-plugin/master/images/k8s-rdma-shared-dev-plugin-ds.yaml
daemonset.apps/rdma-shared-dp-ds created
若上述 k8s-rdma-shared-dev-plugin-ds.yaml 的 git rep 无法访问,可以采用如下方式:
# git clone https://github.com/Mellanox/k8s-rdma-shared-dev-plugin.git
# cd k8s-rdma-shared-dev-plugin/
# kubectl create -f deployment/k8s/base/daemonset.yaml
daemonset.apps/rdma-shared-dp-ds created
# kubectl create -f https://raw.githubusercontent.com/intel/multus-cni/master/images/multus-daemonset.yml
ustomresourcedefinition.apiextensions.k8s.io/network-attachment-definitions.k8s.cni.cncf.io created
clusterrole.rbac.authorization.k8s.io/multus created
clusterrolebinding.rbac.authorization.k8s.io/multus created
serviceaccount/multus created
configmap/multus-cni-config created
daemonset.apps/kube-multus-ds-amd64 created
daemonset.apps/kube-multus-ds-ppc64le created
若上述 multus-daemonset.yml 的 git rep 无法访问,可以采用如下方式:
# git clone https://github.com/k8snetworkplumbingwg/multus-cni.git
# cd multus-cni/
# kubectl create -f deployments/multus-daemonset.yml
customresourcedefinition.apiextensions.k8s.io/network-attachment-definitions.k8s.cni.cncf.io created
clusterrole.rbac.authorization.k8s.io/multus created
clusterrolebinding.rbac.authorization.k8s.io/multus created
serviceaccount/multus created
configmap/multus-cni-config created
daemonset.apps/kube-multus-ds created
# mkdir -p /opt/cni/bin
# wget https://github.com/containernetworking/plugins/releases/download/v1.1.1/cni-plugins-linux-amd64-v1.1.1.tgz
# tar Cxzvf /opt/cni/bin cni-plugins-linux-amd64-v1.1.1.tgz
查看 /opt/cni/bin,可以看到已有多个 cni 插件:
# ls /opt/cni/bin
bandwidth bridge dhcp firewall host-device host-local ipvlan loopback macvlan portmap ptp sbr static tuning vlan vrf
此例将使用 macvlan CNI
为 macvlan CNI 创建两个 network attachment,注意 IP 地址范围与 primary network 的 IP 地址范围不可有重合:
# cat macvlan_cx6dx.yaml apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: macvlan-cx6dx-conf spec: config: '{ "cniVersion": "0.3.1", "type": "macvlan", "master": "ens2f0", "ipam": { "type": "host-local", "subnet": "10.56.217.0/24", "rangeStart": "10.56.217.171", "rangeEnd": "10.56.217.181", "routes": [ { "dst": "0.0.0.0/0" } ], "gateway": "10.56.217.1" } }' # cat macvlan_cx5_bond.yaml apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: macvlan-cx5-bond-conf spec: config: '{ "cniVersion": "0.3.1", "type": "macvlan", "master": "bond0", "ipam": { "type": "host-local", "subnet": "10.56.217.0/24", "rangeStart": "10.56.217.71", "rangeEnd": "10.56.217.81", "routes": [ { "dst": "0.0.0.0/0" } ], "gateway": "10.56.217.1" } }' # kubectl create -f macvlan_cx6dx.yaml networkattachmentdefinition.k8s.cni.cncf.io/macvlan-cx6dx-conf created # kubectl create -f macvlan_cx5_bond.yaml networkattachmentdefinition.k8s.cni.cncf.io/macvlan-cx5-bond-conf created
本例仅用到 macvlan-cx5-bond-conf,若需要使用 macvlan-cx6dx-conf,可在 test-xxx-pod.yaml 中指定对应的 annotation 与 resources:
# cat test-cx5-bond-pod1.yaml apiVersion: v1 kind: Pod metadata: name: mofed-test-cx5-bond-pod1 annotations: k8s.v1.cni.cncf.io/networks: default/macvlan-cx5-bond-conf spec: restartPolicy: OnFailure containers: - image: mellanox/rping-test name: mofed-test-ctr securityContext: capabilities: add: [ "IPC_LOCK" ] resources: limits: rdma/cx5_bond_shared_devices_a: 1 requests: rdma/cx5_bond_shared_devices_a: 1 command: - sh - -c - | ls -l /dev/infiniband /sys/class/infiniband /sys/class/net sleep 1000000 # kubectl create -f test-cx5-bond-pod1.yaml pod/mofed-test-cx5-bond-pod1 created # cat test-cx5-bond-pod2.yaml apiVersion: v1 kind: Pod metadata: name: mofed-test-cx5-bond-pod2 annotations: k8s.v1.cni.cncf.io/networks: default/macvlan-cx5-bond-conf spec: restartPolicy: OnFailure containers: - image: mellanox/rping-test name: mofed-test-ctr securityContext: capabilities: add: [ "IPC_LOCK" ] resources: limits: rdma/cx5_bond_shared_devices_a: 1 requests: rdma/cx5_bond_shared_devices_a: 1 command: - sh - -c - | ls -l /dev/infiniband /sys/class/infiniband /sys/class/net sleep 1000000 # kubectl create -f test-cx5-bond-pod2.yaml pod/mofed-test-cx5-bond-pod2 created
此时可在 pod 中使用 secondary network (eth1) 启动 RoCE 流量:
# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
default mofed-test-cx5-bond-pod1 1/1 Running 0 3m41s
default mofed-test-cx5-bond-pod2 1/1 Running 0 32s
default mofed-test-macvlan-pod 1/1 Running 0 4d9h
kube-system coredns-6d4b75cb6d-752q4 1/1 Running 0 5d3h
kube-system coredns-6d4b75cb6d-7h2g5 1/1 Running 0 5d3h
kube-system etcd-node1 1/1 Running 5 5d3h
kube-system kube-apiserver-node1 1/1 Running 4 5d3h
kube-system kube-controller-manager-node1 1/1 Running 1 5d3h
kube-system kube-flannel-ds-xwlr2 1/1 Running 0 5d3h
kube-system kube-multus-ds-kqhqn 1/1 Running 0 5d2h
kube-system kube-proxy-px447 1/1 Running 0 5d3h
kube-system kube-scheduler-node1 1/1 Running 4 5d3h
kube-system rdma-shared-dp-ds-vps6x 1/1 Running 0 21m
mofed-test-cx5-bond-pod1
# kubectl exec -it mofed-test-cx5-bond-pod1 bash [root@mofed-test-cx5-bond-pod1 /]# ifconfig eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450 inet 10.244.0.211 netmask 255.255.255.0 broadcast 10.244.0.255 inet6 fe80::e45d:c4ff:fe4c:f3b3 prefixlen 64 scopeid 0x20<link> ether e6:5d:c4:4c:f3:b3 txqueuelen 0 (Ethernet) RX packets 12 bytes 1016 (1016.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 8 bytes 612 (612.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1000 (Local Loopback) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 net1: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 inet 10.56.217.71 netmask 255.255.255.0 broadcast 10.56.217.255 ether fa:a4:6e:24:3e:ba txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 [root@mofed-test-cx5-bond-pod1 /]# ib_write_bw -d mlx5_bond_0 -F --report_gbits ************************************ * Waiting for client to connect... * ************************************
mofed-test-cx5-bond-pod1
# kubectl exec -it mofed-test-cx5-bond-pod2 bash [root@mofed-test-cx5-bond-pod2 /]# ifconfig eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450 inet 10.244.0.212 netmask 255.255.255.0 broadcast 10.244.0.255 inet6 fe80::20d6:7eff:fec0:4e39 prefixlen 64 scopeid 0x20<link> ether 22:d6:7e:c0:4e:39 txqueuelen 0 (Ethernet) RX packets 12 bytes 1016 (1016.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 8 bytes 612 (612.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1000 (Local Loopback) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 net1: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 inet 10.56.217.72 netmask 255.255.255.0 broadcast 10.56.217.255 ether a6:46:b9:94:b0:31 txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 [root@mofed-test-cx5-bond-pod2 /]# ib_write_bw -d mlx5_bond_0 -F --report_gbits 10.56.217.71 --------------------------------------------------------------------------------------- RDMA_Write BW Test Dual-port : OFF Device : mlx5_bond_0 Number of qps : 1 Transport type : IB Connection type : RC Using SRQ : OFF TX depth : 128 CQ Moderation : 100 Mtu : 1024[B] Link type : Ethernet GID index : 4 Max inline data : 0[B] rdma_cm QPs : OFF Data ex. method : Ethernet --------------------------------------------------------------------------------------- local address: LID 0000 QPN 0x117c PSN 0xbfdcaf RKey 0x00511b VAddr 0x007fdf469fd000 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:56:217:72 remote address: LID 0000 QPN 0x117d PSN 0x75cbaa RKey 0x004407 VAddr 0x007f65e74dc000 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:56:217:71 --------------------------------------------------------------------------------------- #bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps] 65536 5000 82.62 82.55 0.157445 ---------------------------------------------------------------------------------------
TBD
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。