赞
踩
Graphical file management - Upload, download, move and delete files and folders through the web browser.
图形文件管理 - 通过网络浏览器上传,下载,移动和删除文件和文件夹
File editor - Edit and save files without the need to launch a shell session.
文件编辑器 - 编辑和保存文件,而无需启动Shell会话。
Shell Access - Pop into a command line shell straight from the web portal.
Shell访问 - 直接从Web门户弹出命令行Shell
Queue Management - View up to date details of pending or running running on the cluster.
队列管理 - 查看群集上挂起或正在运行的最新详细信息
Job submission templates - Submit jobs from the web console using preset templates or customize your own. (Includes capability to edit job scripts and parameters on the fly).
作业提交模板 - 使用预设模板从Web控制台提交作业或自定义自己的模板。(包括动态编辑作业脚本和参数的功能)
Full linux desktop streaming via web - Run a full low latency XFCE linux desktop on the compute nodes for GUI heavy jobs such as Matlab, Mathematica etc. Graphical jobs continue to run while disconnected from compute host.
通过Web的完整linux桌面流-在计算节点上运行完整的低延迟XFCE linux桌面,以处理诸如Matlab,Mathematica等之类的GUI繁重任务。图形任务在与计算主机断开连接的情况下继续运行。
No need to install a local xserver in order to run graphical jobs as all rendering is performed on the compute nodes.
由于所有渲染均在计算节点上执行,因此无需安装本地xserver即可运行图形作业。
Apache是服务器前端,以Apache用户身份运行,并接受来自用户的所有请求,具有四个主要功能
用户使用OnDemand通过Web浏览器与其HPC资源进行交互。
前端代理是与所有客户端共享的唯一组件。前端代理将为每个用户创建Nginx(PUN)进程。
用户通过浏览器发起请求,下图说明了该请求如何通过系统传播到特定应用程序(包括仪表板)。
sudo yum install centos-release-scl lsof sudo git
OnDemand的许多系统级依赖项都可以从
https://yum.osc.edu/ondemand/$ONDEMAND_RELEASE/web/$ENTERPRISE_LINUX_VERSION/x86_64/
节点上获得,并且需要将其安装到将成为OnDemand Web服务器的节点上。
To install deps for OnDemand 1.7.x on CentOS 7
sudo yum install \ https://yum.osc.edu/ondemand/1.7/web/el7/x86_64/cjose-0.6.1-1.el7.x86_64.rpm \ https://yum.osc.edu/ondemand/1.7/web/el7/x86_64/cjose-devel-0.6.1-1.el7.x86_64.rpm \ https://yum.osc.edu/ondemand/1.7/web/el7/x86_64/httpd24-mod_auth_openidc-2.4.1-1.el7.x86_64.rpm \ https://yum.osc.edu/ondemand/1.7/web/el7/x86_64/ondemand-1.7.14-1.el7.x86_64.rpm \ https://yum.osc.edu/ondemand/1.7/web/el7/x86_64/ondemand-apache-1.7-8.el7.x86_64.rpm \ https://yum.osc.edu/ondemand/1.7/web/el7/x86_64/ondemand-build-1.7-8.el7.x86_64.rpm \ https://yum.osc.edu/ondemand/1.7/web/el7/x86_64/ondemand-gems-1.7.14-1.7.14-1.el7.x86_64.rpm \ https://yum.osc.edu/ondemand/1.7/web/el7/x86_64/ondemand-gems-1.7.14-1.el7.x86_64.rpm \ https://yum.osc.edu/ondemand/1.7/web/el7/x86_64/ondemand-nginx-1.17.3-6.p6.0.4.el7.x86_64.rpm \ https://yum.osc.edu/ondemand/1.7/web/el7/x86_64/ondemand-nodejs-1.7-8.el7.x86_64.rpm \ https://yum.osc.edu/ondemand/1.7/web/el7/x86_64/ondemand-passenger-6.0.4-6.el7.x86_64.rpm \ https://yum.osc.edu/ondemand/1.7/web/el7/x86_64/ondemand-passenger-devel-6.0.4-6.el7.x86_64.rpm \ https://yum.osc.edu/ondemand/1.7/web/el7/x86_64/ondemand-python-1.7-8.el7.x86_64.rpm \ https://yum.osc.edu/ondemand/1.7/web/el7/x86_64/ondemand-ruby-1.7-8.el7.x86_64.rpm \ https://yum.osc.edu/ondemand/1.7/web/el7/x86_64/ondemand-rubygem-bundler-1.17.3-1.el7.noarch.rpm \ https://yum.osc.edu/ondemand/1.7/web/el7/x86_64/ondemand-runtime-1.7-8.el7.x86_64.rpm \ https://yum.osc.edu/ondemand/1.7/web/el7/x86_64/ondemand-scldevel-1.7-8.el7.x86_64.rpm \ https://yum.osc.edu/ondemand/1.7/web/el7/x86_64/ondemand-selinux-1.7.14-1.el7.x86_64.rpm
OnDemand的核心基础结构存储在/opt/ood下:
OnDemand的核心应用程序存储在/var/www/ood/apps/sys/$APP下:
每个应用程序都有其自己的依赖关系,需要通过运行以下命令(从NPM或Ruby Gems)进行安装:
cd /var/www/ood/apps/sys/$APP
# We have both Node and Rails applications, let's cover both in a single command
sudo NODE_ENV=production RAILS_ENV=production scl enable ondemand -- bin/setup
sudo sed -i 's/^HTTPD24_HTTPD_SCLS_ENABLED=.*/HTTPD24_HTTPD_SCLS_ENABLED="httpd24 rh-ruby25"/' \
/opt/rh/httpd24/service-environment
sudo /etc/sudoers.d/ood << EOF
Defaults:apache !requiretty, !authenticate
apache ALL=(ALL) NOPASSWD: /opt/ood/nginx_stage/sbin/nginx_stage
EOF
touch /var/lib/ondemand-nginx/config/apps/sys/activejobs.conf
touch /var/lib/ondemand-nginx/config/apps/sys/dashboard.conf
touch /var/lib/ondemand-nginx/config/apps/sys/file-editor.conf
touch /var/lib/ondemand-nginx/config/apps/sys/files.conf
touch /var/lib/ondemand-nginx/config/apps/sys/myjobs.conf
touch /var/lib/ondemand-nginx/config/apps/sys/shell.conf
/opt/ood/nginx_stage/sbin/update_nginx_stage &>/dev/null || :
每隔2个小时移除不活动的PUN。
sudo /etc/cron.d/ood << EOF
#!/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
0 */2 * * * root [ -f /opt/ood/nginx_stage/sbin/nginx_stage ] && /opt/ood/nginx_stage/sbin/nginx_stage nginx_clean 2>&1 | logger -t nginx_clean
EOF
此时,如果我们访问我们的Web节点,我们仍然看不到OnDemand页面,因为尚未生成ood-portal配置。现在生成一个通用的:
sudo /opt/ood/ood-portal-generator/sbin/update_ood_portal
这是基本的OnDemand门户配置。
打开防火墙中的端口80(http)和443(https),通常使用 firewalld或iptables完成。
防火墙示例:
$ sudo firewall-cmd --zone=public --add-port=80/tcp --permanent
$ sudo firewall-cmd --zone=public --add-port=443/tcp --permanent
$ sudo firewall-cmd --reload
iptables示例:
$ sudo iptables -I INPUT -p tcp -m tcp --dport 80 -j ACCEPT
$ sudo iptables -I INPUT -p tcp -m tcp --dport 443 -j ACCEPT
$ sudo iptables-save > /etc/sysconfig/iptables
sudo systemctl start httpd24-httpd
将帐户添加到Apache使用的密码文件中
sudo scl enable ondemand -- htpasswd -c /opt/rh/httpd24/root/etc/httpd/.htpasswd $USER
# New password:
# Re-type new password:
# Adding password for user .......
LDAP支持允许用户使用其本地用户名和密码登录。它还消除了系统管理员继续更新.htpasswd文件的需要。
编辑Open OnDemand Portal 配置文件
/etc/ood/config/ood_portal.yml:
# /etc/ood/config/ood_portal.yml
---
# ...
auth:
- 'AuthType Basic'
- 'AuthName "private"'
- 'AuthBasicProvider ldap'
- 'AuthLDAPURL "ldaps://openldap.my_center.edu:636/ou=People,ou=hpc,o=my_center?uid"'
- 'AuthLDAPGroupAttribute memberUid'
- 'AuthLDAPGroupAttributeIsDN off'
- 'RequestHeader unset Authorization'
- 'Require valid-user'
构建/安装更新的Apache配置文件:
sudo /opt/ood/ood-portal-generator/sbin/update_ood_portal
重新启动Apache服务器以使更改生效:
sudo systemctl try-restart httpd24-httpd.service httpd24-htcacheclean.service
用户可以使用其本地用户名和密码登录。
集群配置文件描述了用户可以向其提交作业的每个集群以及用户可以ssh到的登录主机。
需要正确配置群集的应用包括:
创建群集配置文件所在的默认目录:
sudo mkdir -p /etc/ood/config/clusters.d
为要提供访问权限的每个HPC群集创建一个群集YAML配置文件。他们必须具有*.yml扩展名。
仅具有登录节点且没有资源管理器的HPC集群的最简单集群配置文件如下所示:
# /etc/ood/config/clusters.d/my_cluster.yml
---
v2:
metadata:
title: "My Cluster"
login:
host: "my_cluster.my_center.edu"
当前集群名称为linux.
--- v2: metadata: title: "Owens" url: "https://www.osc.edu/supercomputing/computing/owens" hidden: false login: host: "owens.osc.edu" job: adapter: "torque" host: "owens-batch.ten.osc.edu" lib: "/opt/torque/lib64" bin: "/opt/torque/bin" version: "6.0.1" acls: - adapter: "group" groups: - "cluster_users" - "other_users_of_the_cluster" type: "whitelist" custom: grafana: host: "https://grafana.osc.edu" orgId: 3 dashboard: name: "ondemand-clusters" uid: "aaba6Ahbauquag" panels: cpu: 20 memory: 24 labels: cluster: "cluster" host: "host" jobid: "jobid" batch_connect: basic: script_wrapper: "module restore\n%s" vnc: script_wrapper: "module restore\nmodule load ondemand-vnc\n%s"
v2:
Version 2是当前模式示例,并且是集群配置的顶级映射
---
v2:
meta:
Meta描述了如何将集群显示给用户
metadata:
# title: is the display label that will be used anywhere the cluster is referenced
title: "Owens"
# url: provides the ability to show a link to information about the cluster
url: "https://www.osc.edu/supercomputing/computing/owens"
# hidden: setting this to true causes OnDemand to not show this cluster to the user, the cluster is still available for use by other applications
hidden: false
login:
Login控制尝试通过Shell应用程序进行SSH时的主机。由 Dashboard和Job Composer (MyJobs)使用。
login:
host: "owens.osc.edu"
job:
job映射特定于群集的资源管理器。
job:
adapter: "torque"
host: "owens-batch.ten.osc.edu"
lib: "/opt/torque/lib64"
bin: "/opt/torque/bin"
version: "6.0.1"
bin_overrides:
# An example in Slurm
job:
adapter: "slurm"
bin: "/opt/slurm/bin"
conf: "/opt/slurm/etc/slurm.conf"
bin_overrides:
squeue: "/usr/local/slurm/bin/squeue_wrapper"
# Override just want you want/need to
# scontrol: "/usr/local/slurm/bin/scontrol_wrapper"
sbatch: "/usr/local/slurm/bin/sbatch_wrapper"
# Will be ignored because bsub is not a command used in the Slurm adapter
bsub: "/opt/lsf/bin/bsub"
ACL:
ACL访问控制列表提供了一种通过组成员身份限制群集访问的方法。ACL是隐式白名单,但可以显式设置为白名单或黑名单。
acls:
- adapter: "group"
groups:
- "cluster_users"
- "other_users_of_the_cluster"
type: "whitelist" # optional, one of "whitelist" or "blacklist"
要查找组成员身份,ood_core使用ood_support库,"id -G USERNAME"用于获取用户所在的列表组,"getgrgid"用于查找组的名称。
batch_connect:
batch_connect控制交互式应用程序(如Jupyter或交互式桌面)的默认设置。
batch_connect:
basic:
script_wrapper: "module restore\n%s"
vnc:
script_wrapper: "module restore\nmodule load ondemand-vnc\n%s"
HPC群集上Slurm资源管理器的YAML群集配置文件如下所示:
# /etc/ood/config/clusters.d/linux.yml --- v2: metadata: title: "Linux Cluster" login: host: "173.0.20.110" job: adapter: "slurm" cluster: "linux" bin: "/usr/sw-slurm/slurm-16.05.3/bin" conf: "/usr/sw-slurm/slurm-16.05.3/etc/slurm.conf" bin_overrides: squeue: "/usr/sw-slurm/slurm-16.05.3/bin/squeue" batch_connect: basic: script_wrapper: | module purge %s vnc: script_wrapper: | module purge export PATH="/opt/TurboVNC/bin:$PATH" export WEBSOCKIFY_CMD="/root/workspace/websockify-master/run" %s
具有以下配置选项:
adapter:
设置为slurm
cluster:
Slurm集群名称
bin:
Slurm客户端安装二进制文件的路径
conf:
Slurm配置文件的路径
bin_overrides:
Replacements/wrappers for Slurm’s job submission and control clients.
Supports the following clients:
对于所有rake任务,我们都需要位于 Dashboard App的根目录下:
cd /var/www/ood/apps/sys/dashboard
列出我们可以运行的所有可用任务:
scl enable ondemand -- bin/rake -T test:jobs
该列表是从驻留在下的所有可用群集配置文件动态生成的 。/etc/ood/config/clusters.d/*.yml
我的cluster集群的名字叫linux,所以我在此使用linux.yml
测试集群
sudo su $USER -c 'scl enable ondemand -- bin/rake test:jobs:$CLUSTER_NAME RAILS_ENV=production'
Rails Error: Unable to access log file. Please ensure that /var/www/ood/apps/sys/dashboard/log/production.log exists and is writable (ie, make it writable for user and group: chmod 0664 /var/www/ood/apps/sys/dashboard/log/production.log). The log level has been raised to WARN and the output directed to STDERR until the problem is fixed.
mkdir -p /home/export/base/systest/jiangyt/test_jobs
Testing cluster 'linux'...
Submitting job...
[2020-06-15 16:50:42 +0800 ] INFO "execve = [{\"SLURM_CONF\"=>\"/usr/sw-slurm/slurm-16.05.3/etc/slurm.conf\"}, \"/usr/sw-slurm/slurm-16.05.3/bin/sbatch\", \"-D\", \"/home/export/base/systest/jiangyt/test_jobs\", \"-J\", \"test_jobs_linux\", \"-o\", \"/home/export/base/systest/jiangyt/test_jobs/output_linux_2020_06_15t16_50_42_08_00_log\", \"-t\", \"00:01:00\", \"--parsable\", \"-M\", \"linux\"]"
Got job id '9273109'
[2020-06-15 16:50:43 +0800 ] INFO "execve = [{\"SLURM_CONF\"=>\"/usr/sw-slurm/slurm-16.05.3/etc/slurm.conf\"}, \"/usr/sw-slurm/slurm-16.05.3/bin/squeue\", \"--all\", \"--states=all\", \"--noconvert\", \"-o\", \"\\u001E%A\\u001F%i\\u001F%t\", \"-j\", \"9273109\", \"-M\", \"linux\"]"
Job has status of queued
[2020-06-15 16:50:48 +0800 ] INFO "execve = [{\"SLURM_CONF\"=>\"/usr/sw-slurm/slurm-16.05.3/etc/slurm.conf\"}, \"/usr/sw-slurm/slurm-16.05.3/bin/squeue\", \"--all\", \"--states=all\", \"--noconvert\", \"-o\", \"\\u001E%A\\u001F%i\\u001F%t\", \"-j\", \"9273109\", \"-M\", \"linux\"]"
Job has status of completed
Test for 'linux' PASSED!
Finished testing cluster 'linux'
测试成功。
配置文件根目录位于 /etc/ood。公共资产位于 /var/www/ood/public。
/etc/ood/profile
/etc/ood/config/nginx_stage.yml
/etc/ood/config/apps/$APP/env
/etc/ood/config/apps/$APP/initializers/ood.rb
Interactive Apps 需要在计算节点上安装VNC服务器,而不是 OnDemand节点。
For VNC server support:
vim /etc/ood/config/clusters.d/linux.yml
--- v2: metadata: title: "Linux Cluster" login: host: "173.0.20.110" job: adapter: "slurm" cluster: "linux" bin: "/usr/sw-slurm/slurm-16.05.3/bin" conf: "/usr/sw-slurm/slurm-16.05.3/etc/slurm.conf" bin_overrides: squeue: "/usr/sw-slurm/slurm-16.05.3/bin/squeue" batch_connect: basic: script_wrapper: | module purge %s vnc: script_wrapper: | module purge export PATH="/opt/TurboVNC/bin:$PATH" export WEBSOCKIFY_CMD="/root/workspace/websockify-master/run" %s
修改ood-portal-generator的YAML配置文件
# /etc/ood/config/ood_portal.yml
host_regex: '(ab|pn|xy)\d+'
node_uri: '/node'
rnode_uri: '/rnode'
sudo /opt/ood/ood-portal-generator/sbin/update_ood_portal
sudo systemctl try-restart httpd24-httpd.service httpd24-htcacheclean.service
ssh cn060587
nc -l 5432
http://ondemand.my_center.edu/node/<host>/<port>/...
Interactive Desktop需要在要运行批处理作业的节点(而不是 OnDemand节点)上安装桌面环境。
当前支持以下桌面环境:
创建工作目录
mkdir -p /etc/ood/config/apps/bc_desktop
对于每个需要启动Interactive Desktop的集群,都需要在工作目录下创建对应的YAML格式配置文件
# /etc/ood/config/apps/bc_desktop/linux.yml
---
title: "Nsccwx Node Cluster Desktop"
cluster: "linux"
导航到您的OnDemand站点,Dashboard App,应该在顶部的下拉菜单中看到
从菜单中选择“Nsccwx Node Cluster Desktop”后,将显示一个表单,用于“启动”到给定集群的桌面会话。
在我们开始修改配置文件属性之前。让我们首先看一下位于源文件中的默认表单定义
/var/www/ood/apps/sys/bc_desktop/form.yml(不要修改)
# /var/www/ood/apps/sys/bc_desktop/form.yml --- attributes: desktop: "mate" bc_vnc_idle: 0 bc_vnc_resolution: required: true node_type: null form: - bc_vnc_idle - desktop - bc_num_hours - bc_num_slots - node_type - bc_account - bc_queue - bc_vnc_resolution - bc_email_on_started
bc_num_hours
Default: “1”
A Ruby String containing the number of hours a user requested for the Desktop batch job to run.
bc_num_slots
Default: “1”
A Ruby String containing either the number of nodes or processors (depending on the type of resource manager the cluster uses) a user requested.
bc_account
Default: “”
A Ruby String that holds the account the user supplied to charge the job against.
bc_queue
Default: “”
A Ruby String that holds the queue the user requested for the job to run on.
bc_email_on_started
Default: “0”
A Ruby String that can either be “0” (do not send the user an email when the job starts) or “1” (send an email to the user when the job starts).
node_type
Default: “”
A Ruby String that can be used for more advanced job submission. This is an advanced option that is disabled by default and does nothing if you do enable it, unless you add it to a custom job submission configuration file.
desktop属性硬编码为value “mate”。如果要更改此设置以使用"xfce",可以对自定义YAML配置文件进行以下编辑:
# /etc/ood/config/apps/bc_desktop/linux.yml
---
title: "Nsccwx Node Cluster Desktop"
cluster: "linux"
attributes:
desktop: "xfce"
从表单中移除属性bc_queue对应的表单域“Partition”,可以对自定义YAML配置文件进行以下编辑:
# /etc/ood/config/apps/bc_desktop/linux.yml
---
title: "Nsccwx Node Cluster Desktop"
cluster: "linux"
attributes:
desktop: "xfce"
bc_queue: null
在浏览器中刷新表格后,您将不再看到“Partition”字段。
如果您不希望用户在属性下提交具有1个以上节点的,则可以对自定义YAML配置文件进行以下编辑
# /etc/ood/config/apps/bc_desktop/linux.yml
---
title: "Nsccwx Node Cluster Desktop"
cluster: "linux"
attributes:
desktop: "xfce"
bc_num_slots: 1
表单字段Number of nodes将不再显示,并且用户无法更改此值,属性bc_num_slots将始终返回"1"。
# /etc/ood/config/apps/bc_desktop/linux.yml
---
title: "Nsccwx Node Cluster Desktop"
cluster: "linux"
attributes:
desktop: "xfce"
bc_account:
label: "账号"
# /etc/ood/config/apps/bc_desktop/linux.yml
---
title: "Nsccwx Node Cluster Desktop"
cluster: "linux"
attributes:
desktop: "xfce"
bc_account:
label: "账户"
help: "账户信息通过LDAP账户系统集中管理"
要自定义作业提交,需要首先编辑YAML格式配置文件:
# /etc/ood/config/apps/bc_desktop/linux.yml
---
title: "Nsccwx Node Cluster Desktop"
cluster: "linux"
submit: "submit/submit_test.yml.erb"
attributes:
desktop: "xfce"
submit指向自定义的Job Submission YAML配置文件。可以是相对于/etc/ood/config/apps/bc_desktop/目录的绝对文件路径或相对文件路径 。
/etc/ood/config/apps/bc_desktop/submit/submit_test.yml.erb
对于Slurm,将需要修改bc_num_slots (节点数)提交到批处理服务器的方式。
# /etc/ood/config/apps/bc_desktop/submit/submit_test.yml.erb
---
script:
native:
resources:
nodes: "<%= bc_num_slots.blank? ? 1 : bc_num_slots.to_i %>:ppn=28"
可以通过多种方法共享应用程序,包括:
管理员可以通过将应用程序目录复制到 /var/www/ood/apps/sys 来将应用程序安装到系统上。
默认目录权限(755)将允许所有有权访问OnDemand的用户查看和运行该应用程序。通过更改单个应用程序目录的权限,可以限制应用程序的访问权限。
如果网站不希望向未经许可的用户显示许可的软件,则可以执行以下操作:
# Given:
# - an app named $NEW_APP
# - a group named $NEW_APP_GROUP
# - and a user named $NEW_APP_USER
sudo cp -r "/path/to/$NEW_APP" /var/www/ood/apps/sys
sudo chmod 750 "/var/www/ood/apps/sys/$NEW_APP"
sudo chgrp "$NEW_APP_GROUP" "/var/www/ood/apps/sys/$NEW_APP"
sudo usermod -a -G "$NEW_APP_GROUP" "$NEW_APP_USER"
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。