赞
踩
Swarm是Docker官方提供的一款集群管理工具,其主要作用是把若干台Docker主机抽象为一个整体,并且通过一个入口统一管理这些Docker主机上的各种Docker资源。Swarm和Kubernetes比较类似,但是更加轻,具有的功能也较kubernetes更少一些。
swarm 中的每个 Docker Engine 都是一个 node,有两种类型的 node:manager 和 worker。
我们在 manager node 上执行部署命令,manager node 会将部署任务拆解并分配给一个或多个 worker node 完成部署。
manager node 负责执行编排和集群管理工作,保持并维护 swarm 处于期望的状态。swarm 中如果有多个 manager node,它们会自动协商并选举出一个 leader 执行编排任务。
woker node 接受并执行由 manager node 派发的任务。默认配置下 manager node 同时也是一个 worker node,不过可以将其配置成 manager-only node,让其专职负责编排和集群管理工作。
work node 会定期向 manager node 报告自己的状态和它正在执行的任务的状态,这样 manager 就可以维护整个集群的状态。
service 定义了 worker node 上要执行的任务。swarm 的主要编排任务就是保证 service 处于期望的状态下。
举一个 service 的例子:在 swarm 中启动一个 http 服务,使用的镜像是 httpd:latest,副本数为 3。
manager node 负责创建这个 service,经过分析知道需要启动 3 个 httpd 容器,根据当前各 worker node 的状态将运行容器的任务分配下去,比如 worker1 上运行两个容器,worker2 上运行一个容器。
运行了一段时间,worker2 突然宕机了,manager 监控到这个故障,于是立即在 worker3 上启动了一个新的 httpd 容器。
这样就保证了 service 处于期望的三个副本状态。
[root@node191 docker]# docker swarm --help Usage: docker swarm COMMAND Manage Swarm Options: Commands: ca Display and rotate the root CA init Initialize a swarm join Join a swarm as a node and/or manager join-token Manage join tokens leave Leave the swarm unlock Unlock swarm unlock-key Manage the unlock key update Update the swarm Run 'docker swarm COMMAND --help' for more information on a command.
[root@node191 docker]# docker node --help Usage: docker node COMMAND Manage Swarm nodes Options: Commands: demote Demote one or more nodes from manager in the swarm inspect Display detailed information on one or more nodes ls List nodes in the swarm promote Promote one or more nodes to manager in the swarm ps List tasks running on one or more nodes, defaults to current node rm Remove one or more nodes from the swarm update Update a node Run 'docker node COMMAND --help' for more information on a command.
[root@localhost ~]# docker swarm init --advertise-addr 172.16.1.146
Swarm initialized: current node (v2tjxinr9jxfg52evpswn4yb6) is now a manager.
To add a worker to this swarm, run the following command:
docker swarm join \
--token SWMTKN-1-5tvspbnrp9g6oxu6qixwhx98wtzx0t7efwfrh6wbpfbk4id1f7-f01zejqjqfnry2tubl3cractn \
172.16.1.146:2377
To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
[root@localhost ~]# docker swarm join-token worker To add a worker to this swarm, run the following command: docker swarm join \ --token SWMTKN-1-5tvspbnrp9g6oxu6qixwhx98wtzx0t7efwfrh6wbpfbk4id1f7-f01zejqjqfnry2tubl3cractn \ 172.16.1.146:2377 [root@localhost ~]# docker swarm join-token manager To add a manager to this swarm, run the following command: docker swarm join \ --token SWMTKN-1-5tvspbnrp9g6oxu6qixwhx98wtzx0t7efwfrh6wbpfbk4id1f7-8pm1wzhfqx5e7jvl8fg61an3w \ 172.16.1.146:2377 [root@node146 ~]# docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS c9kynm13tvcf1vfrt0m6y7pbi node135 Ready Active Reachable n494afsdjzs74q5y5vb4xlgd4 node136 Ready Active v2tjxinr9jxfg52evpswn4yb6 * node146 Ready Active Leader
## 排空node136
[root@node146 ~]# docker node update --availability drain n494afsdjzs74q5y5vb4xlgd4
n494afsdjzs74q5y5vb4xlgd4
docker node rm node136
docker node rm --force node16
##将一个排空的节点恢复过来,可以正常使用
docker node update --availability Active n494afsdjzs74q5y5vb4xlgd4
## 强制离开swarm集群 docker swarm leave--force [root@node136 ~]# docker swarm leave Node left the swarm. ## 此时节点node136 是down的。 [root@node146 ~]# docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS c9kynm13tvcf1vfrt0m6y7pbi node135 Ready Active Reachable n494afsdjzs74q5y5vb4xlgd4 node136 Down Active v2tjxinr9jxfg52evpswn4yb6 * node146 Ready Active Leader ## manager节点删除掉这个废弃的节点 [root@node146 ~]# docker node rm n494afsdjzs74q5y5vb4xlgd4 n494afsdjzs74q5y5vb4xlgd4 ## 以manager身份重新加入 [root@node136 ~]# docker swarm join \ > --token SWMTKN-1-5tvspbnrp9g6oxu6qixwhx98wtzx0t7efwfrh6wbpfbk4id1f7-8pm1wzhfqx5e7jvl8fg61an3w \ > 172.16.1.146:2377 This node joined a swarm as a manager. [root@node146 ~]# docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS c9kynm13tvcf1vfrt0m6y7pbi node135 Ready Active Reachable n8dgcax0vcqmsjtc0aosx9k2q node136 Ready Active Reachable v2tjxinr9jxfg52evpswn4yb6 * node146 Ready Active Leader
节点从manager降级到worker
docker node demote v2tjxinr9jxfg52evpswn4yb6
[root@node146 ~]# docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS c9kynm13tvcf1vfrt0m6y7pbi node135 Ready Active Reachable n8dgcax0vcqmsjtc0aosx9k2q node136 Ready Active Leader v2tjxinr9jxfg52evpswn4yb6 node146 Down Active Unreachable yvjirlxwpgvjohi3iagtzzkh2 * node146 Ready Active Reachable [root@node146 ~]# docker node demote v2tjxinr9jxfg52evpswn4yb6 Manager v2tjxinr9jxfg52evpswn4yb6 demoted in the swarm. [root@node146 ~]# docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS c9kynm13tvcf1vfrt0m6y7pbi node135 Ready Active Reachable n8dgcax0vcqmsjtc0aosx9k2q node136 Ready Active Leader v2tjxinr9jxfg52evpswn4yb6 node146 Down Active yvjirlxwpgvjohi3iagtzzkh2 * node146 Ready Active Reachable
[root@node146 ~]# docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS c9kynm13tvcf1vfrt0m6y7pbi node135 Ready Active n8dgcax0vcqmsjtc0aosx9k2q node136 Ready Active yvjirlxwpgvjohi3iagtzzkh2 * node146 Ready Active Leader [root@node146 ~]# docker node promote c9kynm13tvcf1vfrt0m6y7pbi Node c9kynm13tvcf1vfrt0m6y7pbi promoted to a manager in the swarm. [root@node146 ~]# docker node promote n8dgcax0vcqmsjtc0aosx9k2q Node n8dgcax0vcqmsjtc0aosx9k2q promoted to a manager in the swarm. [root@node146 ~]# docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS c9kynm13tvcf1vfrt0m6y7pbi node135 Ready Active Reachable n8dgcax0vcqmsjtc0aosx9k2q node136 Ready Active Reachable yvjirlxwpgvjohi3iagtzzkh2 * node146 Ready Active Leader
[root@node191 docker]# docker service --help Usage: docker service COMMAND Manage services Options: Commands: create Create a new service inspect Display detailed information on one or more services logs Fetch the logs of a service or task ls List services ps List the tasks of one or more services rm Remove one or more services rollback Revert changes to a service's configuration scale Scale one or multiple replicated services update Update a service Run 'docker service COMMAND --help' for more information on a command.
https://docs.docker.com/engine/reference/commandline/service/
docker service create --name nginx-service --replicas=3 --publish 8080:8080 nginx:latest
如果仓库是私有仓库,记得增加–with-registry-auth 这个参数,否则其他节点无法拉取镜像,例如:
docker login 172.16.1.146 -p ***** -u admin; docker service create --with-registry-auth --name tomcat-logs-test --replicas=2 --publish 10080:8080 172.16.1.146/wondertek/docker-test:1.0.0-2018091910
docker service ps docker-test
docker service scale docker-test=3
节点属性 | 匹配 | 示例 |
---|---|---|
node.id | 节点ID | node.id == 2ivku8v2gvtg4 |
node.hostname | 节点主机名 | node.hostname != node-2 |
node.role | 节点角色:manager | node.role == manager |
node.labels | 用户定义节点labels | node.labels.security == high |
engine.labels | Docker Engine的labels | engine.labels.operatingsystem == ubuntu 14.04 |
engine.labels匹配docker engine的lables,如操作系统,驱动等。集群管理员通过使用docker node update命令来添加node.labels以更好使用节点。
添加标签
docker node update --label-add type=manager node146 [root@node146 ~]# docker node inspect node146 --pretty ID: v2tjxinr9jxfg52evpswn4yb6 Labels: - type = manager Hostname: node146 Joined at: 2018-07-16 06:26:49.516457267 +0000 utc Status: State: Ready Availability: Active Address: 127.0.0.1 Manager Status: Address: 172.16.1.146:2377 Raft Status: Reachable Leader: Yes Platform: Operating System: linux Architecture: x86_64 Resources: CPUs: 8 Memory: 9.765 GiB Plugins: Network: bridge, host, macvlan, null, overlay Volume: local Engine Version: 1.13.1
docker node update --label-rm type node146
docker service rm my_web docker node update --label-add env=test node135 docker node update --label-add env=prod node136 docker service create \ --constraint node.labels.env==test \ --replicas 3 \ --name my_web2 \ --publish 8080:80 \ httpd [root@node146 ~]# docker service ps my_web2 ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS lzle9hto7mk0 my_web2.1 httpd:latest node135 Running Running 4 seconds ago j9ujd6mcs2ex my_web2.2 httpd:latest node135 Running Running 5 seconds ago lqc4apjhonen my_web2.3 httpd:latest node135 Running Running 3 seconds ago [root@node146 ~]# docker service inspect my_web2 --pretty ID: m7s5ura6bmjg1nd60lfwn8voa Name: my_web2 Service Mode: Replicated Replicas: 3 Placement:Contraints: [node.labels.env==test] UpdateConfig: Parallelism: 1 On failure: pause Max failure ratio: 0 ContainerSpec: Image: httpd:latest@sha256:2edbf09d0dbdf2a3e21e4cb52f3385ad916c01dc2528868bc3499111cc54e937 Resources: Endpoint Mode: vip Ports: PublishedPort 8080 Protocol = tcp TargetPort = 80
docker service rm docker-test
[root@node135 ~]# docker network ls
NETWORK ID NAME DRIVER SCOPE
4888eb34115b bridge bridge local
5dda44146214 docker_gwbridge bridge local
4dda8692018b host host local
mumblsrh5oe4 ingress overlay swarm
1fcd0ef0748f none null local
docker network create --driver overlay --subnet 10.22.1.0/24 swarm_net
docker service create --name my_web --replicas=3 --network swarm_net httpd
docker service create --name util --network swarm_net busybox sleep 10000000
docker exec util.1.muu3o4906mihbp1v8r3ejh80p nslookup tasks.my_web
docker exec util.1.muu3o4906mihbp1v8r3ejh80p ping -c 3 my_web
docker service update --image httpd:2.2.32 my_web
## global mode
docker service create \
--mode global \
--name logspout \
--mount type=bind,source=/var/run/docker.sock,destination=/var/run/docker.sock \
gliderlabs/logspout
docker service update --replicas 6 --update-parallelism 2 --update-delay 1m30s my_web
## 指定新的镜像
docker service update --image httpd:2.2.32 --replicas 6 --update-parallelism 2 --update-delay 1m30s my_web
[root@node146 ~]# docker service ps my_web
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
ku14zmzkpo9a my_web.1 httpd:2.2.32 node135 Running Running about a minute ago
qh6pzjb6syt0 \_ my_web.1 httpd:latest node135 Shutdown Shutdown about a minute ago
0muer26mxx1d my_web.2 httpd:latest node136 Running Running 22 hours ago
k8ybfbc6j20y my_web.3 httpd:2.2.32 node146 Running Running about a minute ago
xr0adp42t7tm \_ my_web.3 httpd:latest node146 Shutdown Shutdown about a minute ago
acd06qrmmnrr my_web.4 httpd:2.2.32 node135 Running Running about a minute ago
jae5i5lhlnb2 my_web.5 httpd:2.2.32 node146 Running Running about a minute ago
3zk4i1drb1nk my_web.6 httpd:2.2.32 node136 Running Running about a minute ago
docker service update --constraint-rm node.labels.env==test my_web2
docker service update --constraint-add node.labels.env==prod my_web2
docker service update --rollback my_web
docker service create --name my_web3 \
--health-cmd "curl --fail http://localhost:8091 || exit 1" \
httpd
--health-cmd Health Check 的命令,还有几个相关的参数:
1. --timeout 命令超时的时间,默认 30s。
2. --interval 命令执行的间隔时间,默认 30s。
3. --retries 命令失败重试的次数,默认为 3,如果 3 次都失败了则会将容器标记为 unhealthy。swarm 会销毁并重建 unhealthy 的副本。
查看健康检查信息
docker inspect b671e3100133 "Health": { "Status": "unhealthy", "FailingStreak": 3, "Log": [ { "Start": "2018-07-18T14:40:18.941056152+08:00", "End": "2018-07-18T14:40:19.027466281+08:00", "ExitCode": 1, "Output": "/bin/sh: 1: curl: not found\n" }, { "Start": "2018-07-18T14:40:49.027620925+08:00", "End": "2018-07-18T14:40:49.076160261+08:00", "ExitCode": 1, "Output": "/bin/sh: 1: curl: not found\n" }, { "Start": "2018-07-18T14:41:19.076291897+08:00", "End": "2018-07-18T14:41:19.124894642+08:00", "ExitCode": 1, "Output": "/bin/sh: 1: curl: not found\n" } ] }
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。