赞
踩
简单介绍一下这个是啥?但其实不用介绍,点进来的人都有一定的了解,我只是记录下流程
LinkedIn搞的一个工具,官方人员发布的相关文章Dynamometer介绍,里面介绍了Dynamometer是个什么工具
Scale testing is expensive—the only way to ensure that something will run on a multi-thousand node cluster is to run it on a cluster with thousands of nodes.
确保某个东西能在数千个节点的规模集群内运行的唯一方法就是在数千个节点的集群上运行它。
搭建大规模集群的代价是昂贵的,而Dynamometer工具可以通过模拟做到这一点
需要什么以及怎么做在Hadoop官网有简单的介绍Hadoop Dynamometer使用文档,所以导致了有很多的坑,LinkedIn的github上有讨论一些坑LinkedIn GitHub(ps:为什么没有手把手的教学啊,时间成本直接被拉满啊,不是所有人都需要看源码啊啊啊,好吧,用这个工具的应该会和源码打交道,我的问题我的问题)
1.一个Hadoop A集群(用来启动模拟B)
2.一个想要测试的Hadoop B集群
如果你只需要启动这个模拟集群,那么你只需要这个Hadoop B的配置+fsimage镜像文件+该版本的安装包
如果你想在这个模拟集群上进行一些操作,可以通过回放审计日志(简单说下审计日志,就是你对hdfs的每个操作,都会被记录下来),也就是以上的准备工作外加一个hdfs-audit.log,如何开启网上有相关文档审计日志配置
审计日志格式如下:
2021-12-07 16:47:35,231 INFO FSNamesystem.audit: allowed=true ugi=角色user (auth:SIMPLE) ip=/ip cmd=操作 src=源对象 dst=null perm=null proto=rpc
在一套现有Hadoop集群的YARN调度框架上以容器的方式启动这套模拟的集群,通过回放审计日志,在模拟集群上做一遍之前的操作,写到文件中
结果文件格式如下:
user,type,operation,numops,cumulativelatency
用户,操作类型读或者写,具体操作,多少次,耗时多久
注:所有的文件都存在Hadoop A集群上或者本地,而审计日志操作的对象是Hadoop B的hdfs
Dynamometer在2.7.2还是2.8是可以使用Dynamometer(具体版本搜hadoop官网,如果版本介绍有这个工具的使用说明就是有),后面集成到Hadoop3.3.0了HDFS-12345,所以之前3.x发布的版本是没有的,需要打补丁或者下载高版本Hadoop,将tool下与Dynamometer相关内容替换到指定位置(sh脚本/jar包)
我是3.2.2版本的,这个版本也有一点坑,一些目录路径有问题
对于Dynamometer工具,我所需的东西都搞到Hadoop A的hdfs上,存放在/dyno的目录下,没有存放在本地(虽然实际存储也是dn的磁盘)
生成fsimage文件对应的xml文件,把fsimage_TXID、fsimage_TXID.xml和同文件夹下的fsimage_TXID.md5、VERSION文件一同放到集群的同一目录下
# 对你的Hadoop B操作
# 进入namenode镜像文件存储目录
cd /xxxx/namenode/current
# 选一个fsimage版本
hdfs oiv -i fsimage_0000000000022252900 -o fsimage_0000000000022252900.xml -p XML
# 将所有的fsimage_TXID*的文件和VERSION 全复制走,弄到你的Hadoop A的hdfs上
用Dynamometer自带的脚本精简binary包
# 对你的Hadoop B操作
# 进入到相关的目录
cd ./xxx/share/hadoop/tools/dynamometer/dynamometer-infra/bin
# 将包精简一下
./create-slim-hadoop-tar.sh hadoop-3.2.2.tar.gz
# 然后你就可以得到一个180多M的hadoop-3.2.2.tar.gz,将这个安装包搞到你Hadoop A的集群上
如果不想浪费时间在配置,先看下第五个坑
如果不想浪费时间在配置,先看下第五个坑
如果不想浪费时间在配置,先看下第五个坑
# 对你的Hadoop B操作
# 进入hadoop安装的目录下的 etc/hadoop
cd /xxx/etc/hadoop/
zip -q -r conf.zip *
# 将这个conf.zip 放在你的Hadoop A的集群上
用fsimage_xxx.xml生成Dyno-HDFS集群中的Dyno-DN
they do not store any actual data, and do not persist anything to disk; they maintain all metadata in memory
根据块ID、generation stamp和块大小三元组,这些文件中的每个文件都应包含相应DataNode应包含的块列表(只有下列的三元组)
如xxxx,xxxxx,xxxx
# 创建 /xxx/share/hadoop/tools/dynamometer/dynamometer-blockgen/lib/
# 创建 /xxx/share/hadoop/tools/dynamometer/dynamometer-infra/lib/
# 创建 /xxx/share/hadoop/tools/dynamometer/dynamometer-workload/lib/
# 将/xxx/share/hadoop/tools/lib/hadoop-dynamometer-aaa-3.2.2.jar 拷贝到相对应的目录下,一一对应就行
生成dn模拟文件
cd /xxx/share/hadoop/tools/dynamometer/
./dynamometer-blockgen/bin/generate-block-lists.sh -fsimage_input_path hdfs:///dyno/fsimage/fsimage_0000000000022252900.xml -block_image_output_dir hdfs:///dyno/blocks -num_reducers 8 -num_datanodes 8
模拟生成的块文件保存的路径由block_image_output_dir参数决定
num_reducers参数决定生成块作业的reducer的数量
num_datanodes参数决定模拟出的datanode的数量
准备工作完成,接下来就可以使用dynamometer了
启动脚本可以传入的参数如下
-appname <arg> Application Name. (default 'DynamometerTest') -block_list_path <arg> Location on HDFS of the files containing the DN block lists. -conf_path <arg> Location of the directory or archive containing the Hadoop configuration. If this is already on a remote FS, will save the copy step, but must be an archive file. This must have the standard Hadoop conf layout containing e.g. etc/hadoop/*-site.xml -datanode_args <arg> Additional arguments to add when starting the DataNodes. -datanode_launch_delay <arg> The period over which to launch the DataNodes; this will be used as the maximum delay and each DataNode container will be launched with some random delay less than this value. Accepts human-readable time durations (e.g. 10s, 1m) (default 0s) -datanode_memory_mb <arg> Amount of memory in MB to be requested to run the DNs (default 2048) -datanode_nodelabel <arg> The node label to specify for the container to use to run the DataNode. -datanode_vcores <arg> Amount of virtual cores to be requested to run the DNs (default 1) -datanodes_per_cluster <arg> How many simulated DataNodes to run within each YARN container (default 1) -fs_image_dir <arg> Location of the directory containing, at minimum, the VERSION file for the namenode. If running the namenode within YARN (namenode_info_path is not specified), this must also include the fsimage file and its md5 hash with names conforming to: `fsimage_XXXXXXXX[.md5]`. -hadoop_binary_path <arg> Location of Hadoop binary to be deployed (archive). One of this or hadoop_version is required. -hadoop_version <arg> Version of Hadoop (like '2.7.4' or '3.0.0-beta1') for which to download a binary. If this is specified, a Hadoop tarball will be downloaded from an Apache mirror. By default the Berkeley OCF mirror is used; specify dyno.apache-mirror as a configuration or system property to change which mirror is used. The tarball will be downloaded to the working directory. One of this or hadoop_binary_path is required. -help Print usage -master_memory_mb <arg> Amount of memory in MB to be requested to run the application master (default 2048) -master_vcores <arg> Amount of virtual cores to be requested to run the application master (default 1) -namenode_args <arg> Additional arguments to add when starting the NameNode. Ignored unless the NameNode is run within YARN. -namenode_edits_dir <arg> The directory to use for the NameNode's edits directory. If not specified, a location within the container's working directory will be used. -namenode_memory_mb <arg> Amount of memory in MB to be requested to run the NN (default 2048). Ignored unless the NameNode is run within YARN. -namenode_metrics_period <arg> The period in seconds for the NameNode's metrics to be emitted to file; if <=0, disables this functionality. Otherwise, a metrics file will be stored in the container logs for the NameNode (default 60). -namenode_name_dir <arg> The directory to use for the NameNode's name data directory. If not specified, a location within the container's working directory will be used. -namenode_nodelabel <arg> The node label to specify for the container to use to run the NameNode. -namenode_servicerpc_addr <arg> Specify this option to run the NameNode external to YARN. This is the service RPC address of the NameNode, e.g. localhost:9020. -namenode_vcores <arg> Amount of virtual cores to be requested to run the NN (default 1). Ignored unless the NameNode is run within YARN. -queue <arg> RM Queue in which this application is to be submitted (default 'default') -shell_env <arg> Environment for shell script. Specified as env_key=env_val pairs -timeout <arg> Application timeout in milliseconds (default -1 = unlimited) -token_file_location <arg> If specified, this file will be used as the delegation token(s) for the launched containers. Otherwise, the delegation token(s) for the default FileSystem will be used. -workload_config <arg> Additional configurations to pass only to the workload job. This can be used multiple times and should be specified as a key=value pair, e.g. '-workload_config conf.one=val1 -workload_config conf.two=val2' -workload_input_path <arg> Location of the audit traces to replay (Required for workload) -workload_output_path <arg> Location of the metrics output (Required for workload) -workload_rate_factor <arg> Rate factor (multiplicative speed factor) to apply to workload replay (Default 1.0) -workload_replay_enable If specified, this client will additionally launch the workload replay job to replay audit logs against the HDFS cluster which is started. -workload_start_delay <arg> Delay between launching the Workload MR job and starting the audit logic replay; this is used in an attempt to allow all mappers to be launched before any of them start replaying. Workloads with more mappers may need a longer delay to get all of the containers allocated. Human-readable units accepted (e.g. 30s, 10m). (default 1m) -workload_threads_per_mapper <arg> Number of threads per mapper to use to replay the workload. (default 1)
我这边没有给它指定一个nn,让它自己去启动(ps:nn和dn的大小我给的都是满多的,个人认为nn最低不能小于fsimage大小,dn不能小于dn单文件大小,元数据是存储在内存中的)
./dynamometer-infra/bin/start-dynamometer-cluster.sh -hadoop_binary_path hdfs:///dyno/hadoop-3.2.2.tar.gz -conf_path hdfs:///dyno/conf.zip -fs_image_dir hdfs:///dyno/fsimage -block_list_path hdfs:///dyno/blocks
如果你启动成功的时候,这个任务在shell脚本会开始启动nn/dn,并且汇报块丢失之类的,然后阻塞状态了,永远不退出
如果很流畅地退出了,说明你启动失败了,nn的web ui 是nn容器所在的服务器ip:50077,日志地址是ip:8042/xxxxxxx,具体是啥看你任务运行的info信息,遇到其他问题的话也请看nn日志
21/12/06 14:58:47 INFO dynamometer.Client: NameNode can be reached via HDFS at: hdfs://hostname:9002/
21/12/06 14:58:47 INFO dynamometer.Client: NameNode web UI available at: http://hostname:50077/
21/12/06 14:58:47 INFO dynamometer.Client: NameNode can be tracked at: http://hostname:8042/node/containerlogs/container_e127_1638434040871_0015_01_000002/deploy/
21/12/06 14:58:47 INFO dynamometer.Client: Waiting for NameNode to finish starting up...
我在nn日志追踪中找到了这个
ERROR: Cannot find configuration directory "/xxxxxx/appcache/application_1638434040871_0015/container_e127_1638434040871_0015_01_000002/conf/etc/hadoop"
配置找不到?不对啊?我明明上传了配置啊?
进入NN容器所在的机器,进入/xxxxxx/conf目录,发现模拟NN的conf竟然在/xxxxxx/conf/xxxx/etc/hadoop?????
好家伙,把我之前压缩的信息都搞过来了?我直接定位相关源码,实际上NN启动还是靠的脚本,脚本位于hadoop-tools/hadoop-dynamometer/hadoop-dynamometer-infra/src/main/resources/start-component.sh
在这个脚本中会启动namenode
if ! "${HADOOP_HOME}/sbin/hadoop-daemon.sh" start namenode "${namenodeConfigs[@]}" $NN_ADDITIONAL_ARGS; then
echo "Unable to launch NameNode; exiting."
exit 1
fi
好家伙,我直接修改它的conf变量
confDir="$(pwd)/conf/xxxxx"
然后重新mvn一下,替换Hadoop A的对应包
这个问题就告一段落了
这个不一定能遇到,我是因为Hadoop B集群上使用到了ranger
直接删除hdfs-site.xml相关属性就行了
怎么测试啊?我启动了,然后呢?哦哦哦,需要审计日志,之前的审计日志派上用场,审计日志的开启可以网上搜,开启后正常对Hadoop B 操作,等里面的记录够多了将这个日志拷贝到Hadoop A的hdfs上就行了
继续冲,运行审计日志回放任务
cd /xxx/share/hadoop/tools/dynamometer/
./dynamometer-workload/bin/start-workload.sh -Dauditreplay.input-path=hdfs:///dyno/audit_logs -Dauditreplay.output-path=hdfs:///dyno/results/ -Dauditreplay.num-threads=1 -Dauditreplay.log-start-time.ms=1638895655231 -nn_uri hdfs://模拟nn的hostname:9002 -mapper_class_name AuditReplayMapper
1638895655231是审计日志的第一条记录的开始时间戳
…看日志一直sleep…
睡到了天长地久,无奈看了一下源码(好吧我猜到是时区问题,没看源码偷懒了,时间戳直接加了8个小时,后面看的源码)。。。实际上就是个mr任务,map方法获取每一行的值,然后通过线程去模拟操作,它居然默认是UTC。。。
对应的是auditreplay.log-date.time-zone变量,我没看到有相关说明(ps:可能是我自己没看到)。。。
./dynamometer-workload/bin/start-workload.sh -Dauditreplay.input-path=hdfs:///dyno/audit_logs -Dauditreplay.output-path=hdfs:///dyno/results/ -Dauditreplay.num-threads=1 -Dauditreplay.log-start-time.ms=1638895655231 -Dauditreplay.log-date.time-zone=UTC+8 -nn_uri hdfs://模拟nn的hostname:9002 -mapper_class_name AuditReplayMapper
看日志所有的操作都是无效的
2021-12-07 21:43:40,992 INFO [main] org.apache.hadoop.tools.dynamometer.workloadgenerator.audit.AuditReplayMapper: Percentage of invalid ops: 100.0
对着源码疯狂打印日志,发现在线程内的方法中抛出异常,因为代理问题
IOException: User: xxx is not allowed to impersonate xxx
修改Hadoop B的conf的core-site.xml,添加
# xxx为你运行负载任务的user名
<property>
<name>hadoop.proxyuser.xxx.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.xxx.groups</name>
<value>*</value>
</property>
文件问题:为了缩短时间,我的审计日志只取了几行,所以报操作对象没有,自己模拟一下文件就解决了
权限问题:查看日志发现 模拟工作负载时候审计日志中的某些操作会因为权限问题抛出异常
Permission denied: user=aaa, access=EXECUTE, inode="xxxxx":bbb:ccc:drwx------
实际上抛出异常也没关系,反正这些没有被记录到最终的结果中,为了解决这个问题,提供几种思路(个人建议第三种)
1.关闭hdfs的权限校验
hdfs-site.xml 中 dfs.permissions.enabled改为false
(正常的hdfs集群是有效的,但是我在模拟的机器上没发现生效,可能我操作有问题,我当时修改了启动模拟集群的hdfs配置和模拟集群的配置,发现关闭启动模拟集群的hdfs配置后 模拟集群的启动时间特别慢,主要是块的那部分)
2.chmod -R 777 就完事了(当模拟hdfs上的inode特别多的时候特别耗时,可以考虑脚本并行)
3.修改工作负载源码 hadoop-dynamometer-workload-3.2.2.jar
AuditReplayThread.java中,实际上是根据你审计日志的角色来创建FileSystem,用一个ConcurrentMap<String, FileSystem> fsCache来存储,而这个逻辑是在replayLog方法中(只是为了权限问题,所以在这没有对fsCache存储的key全更改为hdfs用户)
private boolean replayLog(final AuditReplayCommand command) { final String src = command.getSrc(); final String dst = command.getDest(); FileSystem proxyFs = fsCache.get(command.getSimpleUgi()); //将command.getSimpleUgi()修改为hdfs,防止没有权限 if (proxyFs == null) { UserGroupInformation ugi = UserGroupInformation .createProxyUser("hdfs", loginUser); proxyFs = ugi.doAs((PrivilegedAction<FileSystem>) () -> { try { FileSystem fs = new DistributedFileSystem(); fs.initialize(namenodeUri, mapperConf); return fs; } catch (IOException ioe) { throw new RuntimeException(ioe); } }); fsCache.put(command.getSimpleUgi(), proxyFs); } ... }
fsimage和hdfs-audit.log的先后顺序是怎么样的呢?
最终选定了fsimage镜像文件的时间线在审计日志之前,忽略小部分失败,用基数来降低小部分失败带来的影响
看日志,定位到相关问题(当然问题可能不一样,只是一种思路)
Exception in thread "main" java.lang.IllegalAccessError: tried to access method org.apache.hadoop.hdfs.server.datanode.StorageLocation.getUri()Ljava/net/URI; from class org.apache.hadoop.tools.dynamometer.SimulatedDataNodes
at org.apache.hadoop.tools.dynamometer.SimulatedDataNodes.run(SimulatedDataNodes.java:118)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.tools.dynamometer.SimulatedDataNodes.main(SimulatedDataNodes.java:91)
./start-component.sh: line 334: kill: (51028) - No such process
由于以上问题,启动时会很快退出,我在相关代码前面加了永久睡眠,好查看其他日志
System.setProperty(MiniDFSCluster.PROP_TEST_BUILD_DATA,
DataNode.getStorageLocations(getConf()).get(0).getUri().getPath());
怀疑是由于低版本的该代码部分被改动了,造成不兼容/冲突问题
看了下,发现是test有关的,我给删了,反正摸索阶段呗,错了就错了
后续dn正常启动
尝试删除模拟集群hdfs上文件——正常
尝试模拟工作负载——正常
时区auditreplay.log-date.time-zone无效
2021-12-10 20:11:40,137 INFO [main] org.apache.hadoop.tools.dynamometer.workloadgenerator.audit.AuditLogDirectParser: m.group(timestamp)2021-12-10 14:33:56,704
2021-12-10 20:11:40,137 INFO [main] org.apache.hadoop.tools.dynamometer.workloadgenerator.audit.AuditLogDirectParser: relativeTimestamp=====28800000
2021-12-10 20:11:40,137 INFO [main] org.apache.hadoop.tools.dynamometer.workloadgenerator.audit.AuditLogDirectParser: startTimestamp=====1639118036704
2021-12-10 20:11:40,141 INFO [main] org.apache.hadoop.tools.dynamometer.workloadgenerator.audit.AuditReplayCommand: absoluteTimestamp======1639167144072
2021-12-10 20:11:40,141 INFO [main] org.apache.hadoop.tools.dynamometer.workloadgenerator.audit.AuditReplayMapper: ======= delay28843931
暴力做法:直接将audit审计日志的开始时间戳提前了8小时(修改auditreplay.log-start-time.ms),问题得以解决
保存在内存中,保证启动mock dn的容器内存能够承载文件元数据即可
提供思路:使用一个版本的fsimage文件,通过解析的方式(hdfs oiv 成csv文件 )来获取到每个路径,根据权限d 还是 -开头来判断目录和文件(使用quota也可,方法一堆),可以自己搞个mr来构建,注意模拟负载时候的权限问题
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。