赞
踩
第一复本写本地, 第二复本写其他机架, 第三复本写其他机架的不同节点
目的: 尽可能地容灾, 不仅防止单台机器宕机, 也防止整个机架异常; 同时保证写的速度 (本地更快)
dfs.namenode.block-placement-policy.default.prefer-local-node
默认为true, 当存在本地put操作时, 优先选择本机, 最终结果是本机datanode存储使用率高
Controls how the default block placement policy places the first replica of a block. When true, it will prefer the node where the client is running. When false, it will prefer a node in the same rack as the client. Setting to false avoids situations where entire copies of large files end up on a single node, thus creating hotspots.
配置项: dfs.datanode.fsdataset.volume.choosing.policy
本质就是轮询
# hdfs balancer --help
Usage: hdfs balancer
[-policy <policy>] the balancing policy: datanode or blockpool
[-threshold <threshold>] Percentage of disk capacity
[-exclude [-f <hosts-file> | <comma-separated list of hosts>]] Excludes the specified datanodes.
[-include [-f <hosts-file> | <comma-separated list of hosts>]] Includes only the specified datanodes.
[-source [-f <hosts-file> | <comma-separated list of hosts>]] Pick only the specified datanodes as source nodes.
[-idleiterations <idleiterations>] Number of consecutive idle iterations (-1 for Infinite) before exit.
[-runDuringUpgrade] Whether to run the balancer during an ongoing HDFS upgrade.This is usually not desired since it will not affect used space on over-utilized machines.
计算每个DataNode节点磁盘使用率, 并结合集群平均使用率v1, 以及配置项threshold, 将DataNode划分为四个等级
HDFS集群的平均使用率= sum(DFS Used) * 100 / sum(Capacity)
相关参数
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。