赞
踩
ES的索引是由若干个分片组成,在索引创建的时候需要指定分片个数、副本个数。如果没有指定,分片个数默认为5个,副本个数默认1个。一个索引的各个分片会根据路由算法均匀的分布于各个节点中。本文提出的问题是,如果一个分片指定分片到某个节点,而该节点的数据路径分布于多个磁盘上,即elasticsearch.yml中的配置如下:
path.data:/disk1/data/elasticsearch,/disk2/data/elasticsearch,/disk3/data/elasticsearch
在ES为分片选择路径的时候,是如何选择将分片落地在哪个路径的呢?
ES分片会在两种情况下去分配分片:
无论是什么引起分片的分配,都需要调用如下的分片路径选择方法:
public static ShardPath selectNewPathForShard(NodeEnvironment env, ShardId shardId, IndexSettings indexSettings, long avgShardSizeInBytes, Map<Path,Integer> dataPathToShardCount) throws IOException { final Path dataPath; final Path statePath; if (indexSettings.hasCustomDataPath()) { dataPath = env.resolveCustomLocation(indexSettings, shardId); statePath = env.nodePaths()[0].resolve(shardId); } else { BigInteger totFreeSpace = BigInteger.ZERO; for (NodeEnvironment.NodePath nodePath : env.nodePaths()) { totFreeSpace = totFreeSpace.add(BigInteger.valueOf(nodePath.fileStore.getUsableSpace())); } // TODO: this is a hack!! We should instead keep track of incoming (relocated) shards since we know // how large they will be once they're done copying, instead of a silly guess for such cases: // Very rough heuristic of how much disk space we expect the shard will use over its lifetime, the max of current average // shard size across the cluster and 5% of the total available free space on this node: BigInteger estShardSizeInBytes = BigInteger.valueOf(avgShardSizeInBytes).max(totFreeSpace.divide(BigInteger.valueOf(20))); // TODO - do we need something more extensible? Yet, this does the job for now... final NodeEnvironment.NodePath[] paths = env.nodePaths(); NodeEnvironment.NodePath bestPath = null; BigInteger maxUsableBytes = BigInteger.valueOf(Long.MIN_VALUE); for (NodeEnvironment.NodePath nodePath : paths) { FileStore fileStore = nodePath.fileStore; BigInteger usableBytes = BigInteger.valueOf(fileStore.getUsableSpace()); assert usableBytes.compareTo(BigInteger.ZERO) >= 0; // Deduct estimated reserved bytes from usable space: Integer count = dataPathToShardCount.get(nodePath.path); if (count != null) { usableBytes = usableBytes.subtract(estShardSizeInBytes.multiply(BigInteger.valueOf(count))); } if (bestPath == null || usableBytes.compareTo(maxUsableBytes) > 0) { maxUsableBytes = usableBytes; bestPath = nodePath; } } statePath = bestPath.resolve(shardId); dataPath = statePath; } return new ShardPath(indexSettings.hasCustomDataPath(), dataPath, statePath, shardId); }
下面根据上一小节的代码进行分析
1、获取该索引下已有分片的平均大小
2、计算path.data指定的数据路径的所有可用的空间的小的5%
取1,2中较大的值作为预估分片大小estShardSizeInBytes
usableBytes = usableBytes-路径下该索引的分片数*estShardSizeInBytes
比较各个路径的usableBytes 值,最大的路径将拥有该分片。
由于ES预估分片大小的算法并不准确,因此ES的分片分配策略并无法保证多个磁盘间的数据均衡分布。
假设数据路径,已经他们的剩余空间,总空间大小如下
/disk1/data/elasticsearch 10G 20G
/disk2/data/elasticsearch 9.5G 20G
/disk3/data/elasticsearch 9.5G 20G
先后创建2个索引people1,people2。他们的分片数都是1。
首先创建people1,根据上述算法,其分片的预估值为(10+9.5+9.5)*5% = 1.45G
由于该索引还没有分片,因此各个路径计算所得的剩余可用空间如下:
/disk1/data/elasticsearch 10G
/disk2/data/elasticsearch 9.5G
/disk3/data/elasticsearch 9.5G
/disk1/data/elasticsearch 剩余空间最多,people1唯一的分片分配给/disk1/data/elasticsearch。由于people1索引的数据为空,不影响/disk1/data/elasticsearch的剩余空间。因此people1创建后的剩余空间如下
/disk1/data/elasticsearch 10G 20G
/disk2/data/elasticsearch 9.5G 20G
/disk3/data/elasticsearch 9.5G 20G
按照people1的流程,可知people2的分片也是分片给/disk1/data/elasticsearch。因此两个索引的数据都将存放于该路径下。这样的结果就是导致两个索引的数据导入之后,造成磁盘间的数据倾斜问题。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。