繁依Fanyi0

这个屌丝很懒，什么也没留下！

热门标签

【C++进阶：哈希--unordered系列的容器及封装】

作者：繁依Fanyi0 | 2024-05-13 12:22:06

踩

本课涉及到的所有代码都见以下链接，欢迎参考指正！

practice: 课程代码练习 - Gitee.comhttps://gitee.com/ace-zhe/practice/tree/master/Hash

unordered系列关联式容器

在C++98中，STL提供了底层为红黑树结构的一系列关联式容器，在查询时效率可达到log_2N，即最差情况下需要比较红黑树的高度次，当树中的节点非常多时，查询效率也不理想。最好的查询是，进行很少的比较次数就能够将元素找到，因此在C++11中，STL又提供了4个unordered系列的关联式容器，这四个容器与红黑树结构的关联式容器使用方式基本类似，只是其底层结构不同，这里中只对unordered_map和unordered_set进行介绍，unordered_multimap和unordered_multiset可自行查看文档介绍。

unordered系列mpa、set和普通map、set区别：

unordered_mpa、unordered_set的简单测试：

使用时与map、set的基本用法几乎一样，观察发现，unordered系列的关联容器只能完成key值的去重，而不能完成排序，那为什么C++11要搞出这个系列呢？原因是综合考虑后unordered系列的容器基于哈希表底层的特性会使整体操作效率更高，下面我们从几个方面来测试set和unordered_set的性能：

性能测试：

性能测试代码如下（这里我们主要测试插入大量不同形式的随机数，二者性能，足够说明问题）：


void performance_test()
{
	const size_t N = 100000;
 
	unordered_set<int> us;
	set<int> s;
 
	vector<int> v;
	v.reserve(N);
	srand(time(0));
	for (size_t i = 0; i < N; ++i)
	{
		//v.push_back(rand());
		//v.push_back(rand()+i);
		v.push_back(i);
	}
 
	size_t begin1 = clock();
	for (auto e : v)
	{
		s.insert(e);
	}
	size_t end1 = clock();
	cout << "set insert:" << end1 - begin1 << endl;
 
	size_t begin2 = clock();
	for (auto e : v)
	{
		us.insert(e);
	}
	size_t end2 = clock();
	cout << "unordered_set insert:" << end2 - begin2 << endl << endl;
 
 
	size_t begin3 = clock();
	for (auto e : v)
	{
		s.find(e);
	}
	size_t end3 = clock();
	cout << "set find:" << end3 - begin3 << endl;
 
	size_t begin4 = clock();
	for (auto e : v)
	{
		us.find(e);
	}
	size_t end4 = clock();
	cout << "unordered_set find:" << end4 - begin4 << endl << endl;
 
	cout << s.size() << endl;
	cout << us.size() << endl << endl;;
 
	size_t begin5 = clock();
	for (auto e : v)
	{
		s.erase(e);
	}
	size_t end5 = clock();
	cout << "set erase:" << end5 - begin5 << endl;
 
	size_t begin6 = clock();
	for (auto e : v)
	{
		us.erase(e);
	}
	size_t end6 = clock();
	cout << "unordered_set erase:" << end6 - begin6 << endl << endl;
}

测试结果和分析如下：

由此，我们总结，实际当中大部分情况下我们还是更推荐使用unordered系列的关联式容器，前面提到它的高效率源自于其底层的哈希结构，因此接下来的重点我们就是要学习哈希结构的模拟实现。

unordered系列容器的底层结构

哈希（又称散列）概念：

顺序结构以及平衡树中，元素关键码与其存储位置之间没有对应的关系，因此在查找一个元素 时，必须要经过关键码的多次比较。顺序查找时间复杂度为O(N)，平衡树中为树的高度，即 O(log_2 N)，搜索的效率取决于搜索过程中元素的比较次数。

理想的搜索方法：可以不经过任何比较，一次直接从表中得到要搜索的元素。

如果构造一种存储结构，通过某种函数(hashFunc)使元素的存储位置与它的关键码之间能够建立 一一映射的关系，那么在查找时通过该函数可以很快找到该元素。

当向该结构中：

插入元素

根据待插入元素的关键码，以此函数计算出该元素的存储位置并按此位置进行存放

搜索元素

对元素的关键码进行同样的计算，把求得的函数值当做元素的存储位置，在结构中按此位置取元素比较，若关键码相等，则搜索成功

该方式即为哈希(散列)方法，哈希方法中使用的转换函数称为哈希(散列)函数，构造出来的结构称 为哈希表(Hash Table)(或者称散列表)。

哈希函数：

哈希方法中使用的转换函数称为哈希(散列)函数，引起哈希冲突的一个原因可能是：哈希函数设计不够合理。

哈希函数设计原则：

1.哈希函数的定义域必须包括需要存储的全部关键码，而如果散列表允许有m个地址时，其值域必2.须在0到m-1之间

3.哈希函数计算出来的地址能均匀分布在整个空间中

4.哈希函数应该比较简单

常见哈希函数：

常用的有两种：直接定制法和除留余数法，分别通过下图来演示分析

哈希冲突的解决：

上述我们了解到，除留余数法是比较好的哈希函数设计方法，但存在哈希冲突，那我们就要着手解决哈希冲突，解决哈希冲突常用的有两种方法：闭散列和开散列，下面我们分别介绍并实现。

闭散列：

也叫开放定址法，当发生哈希冲突时，如果哈希表未被装满，说明在哈希表中必然还有空位置，那么可以把key存放到冲突位置中的“下一个” 空位置中去。那如何寻找下一个空位置呢？

1. 线性探测概念

线性探测：从发生冲突的位置开始，依次向后探测，直到寻找到下一个空位置为止 。

插入：

1.通过哈希函数获取待插入元素在哈希表中的位置

2.如果该位置中没有元素则直接插入新元素，如果该位置中有元素发生哈希冲突，使用线性探测找到下一个空位置，插入新元素

删除： 采用闭散列处理哈希冲突时，不能随便物理删除哈希表中已有的元素，若直接删除元素会影响其他元素的搜索。比如删除元素12，如果直接删除掉，22查找起来可能会受影响。因此线性探测采用标记的伪删除法来删除一个元素。

2.线性探测的模拟实现

这里顺便就尝试搭建哈希表的整体框架，本处我们主要以学习哈希思想及结构为主，因此暂时不需要考虑封装unordered_map和unordered_set时的问题，其中重点将以注释的形式在代码中标注。

整体结构代码如下：


#pragma once
#include<iostream>
#include<vector>
namespace wz
{
	//定义枚举结构表示哈希表每一个地址空间的状态
	enum State
	{
		EEMPTY,
		EXIST,
		DELETE
	};
	template<class K, class V>
	//定义哈希表中每个地址空间的数据类型，包括两部分：数据+状态
	class HashData
	{
		pair<K, V> _kv;
		State _s = EMPTY;
	};
	template<class K,class V>
	class HashTable
	{
	public:
		//...
		//此处实现其各个成员函数，insert等
		//...
	private:
		vector<HashData<K,V>> _table;//用vector来存储HastData类型的数据来表示哈希表结构
		size_t _n=0//表示存储的数据
	};
 
}

insert()代码如下：

实现的思路：

1、实现核心的线性探测部分，我们会发现只有这部分代码会报错：


//线性探测
size_t hashi = kv.first % _table.size();
size_t index = hashi;
int i = 1;
while (_table[index]._s == EXIST)
{
	index = hashi + i;
	index %= _table.size();
	++i;
 }
_table[index]._kv = kv;
_table[index]._s = EXIST;
_n++;

原因是刚开始并没有给_table开辟空间，因此扩容发生在刚开始插入的时候，另外当哈希表中数据量占整体空间过大时，哈希冲突会增加，因此在现有空间即将存满的时候也会进行扩容，我们怎么衡量哈希表即将存满呢，这时候就要提出一个概念叫负载因子，负载因子=_n/_table.size()，这里扩容我们采用的是重新构造一个新容量的哈希表，将现有哈希表中的数据入新表，新表旧表数据互换，运行并测试如下：

观察上述代码我们发现在扩容后旧表数据入新表时，也需要每插入一次，线性探测一次，与后续插入新元素时的线性探测代码几乎一样，考虑C++代码的高复用性，我们需对整体代码加以改造，通过分析得到扩容后对新表的操作也可以看作是插入操作，因此这里其实直接复用insert函数即可，具体实现如下：


//扩容
if (_table.size() == 0 || _n * 10 / _table.size() > 7)
{
	size_t newsize = _table.size() == 0 ? 10 : _table.size() * 2;
	HashTable<K,V> newht;
	newht._table.resize(newsize);
	
	for (auto& data : _table)
	{
		if (data._s== EXIST)
		{
			newht.insert(data._kv);
		}
	}
	_table.swap(newht._table);
}

另外重要的一点是不能忘记还要考虑去重问题，去重就需要查找，因此我们先来实现查找函数,查找的思路也是以线性探测为核心

find()代码如下：


HashData<K, V>* find(const K& key)
{
//如果当前哈希表没开空间，当然不会找到，返回空指针
if (_table.size() == 0)
{
	return nullptr;
}
 
  // 线性探测
  size_t hashi = key% _table.size();
  size_t i = 1;
  size_t index = hashi;
    //如果当前哈希表中数据状态不为空，则可能有与要插入的值相等的值
    while (_table[index]._s != EMPTY)
	{
        //存在且相等，表示该值已经插入到表中了，返回该数据所在表中的地址
		if (_table[index]._s == EXIST
			&& _table[index]._kv.first == key)
		  {
			return &_table[index];
		  }
 
			index = hashi + i;
			index %= _table.size();
			++i;
 
			// 如果已经查找一圈，那么说明全是存在+删除
			if (index == hashi)
			{
				break;
			}
	  }
//到这里就说明没找到，返回空指针即可
	return nullptr;
}

完善insert（）代码如下：


bool insert(const pair<K, V>& kv)
{
	//判断表里原来有没有
	if (find(kv.first))
	{
		return false;
	}
	//扩容
	if (_table.size() == 0 || _n * 10 / _table.size() > 7)
	{
		size_t newsize = _table.size() == 0 ? 10 : _table.size() * 2;
		HashTable<K,V> newht;
	    newht._table.resize(newsize);
		
		for (auto& data : _table)
		{
			if (data._s== EXIST)
			{
				newht.insert(data._kv);
			}
		}
		_table.swap(newht._table);
	}
	//线性探测
	size_t hashi = kv.first % _table.size();
	size_t index = hashi;
	int i = 1;
	while (_table[index]._s == EXIST)
	{
		index = hashi + i;
		index %= _table.size();
		++i;
	 }
	_table[index]._kv = kv;
	_table[index]._s = EXIST;
	_n++;
return true;
}

测试结果如下：

erase()代码如下：

删除部分我们采取直接用状态标识的假删除，因此实现起来比较简单，如下：


bool erase(const K& key)
{
	HashData<K, V>* ret = find(key);
	if (ret)
	{
		ret->_s = DELETE;
		--_n;
		return true;
	}
	else
	{
		return false;
	}

测试结果如下：

综合测试：

3.二次探测概念

上面我们用代码实现了线性探测一次来解决哈希碰撞的问题，但解决方式不止一种，二次探测也是常用的方法

线性探测的缺陷是产生冲突的数据堆积在一块，容易形成踩踏这与其找下一个空位置有关系，因为找空位置的方式就是挨着往后逐个去找，因此二次探测为了避免该问题，找下一个空位置的方法为：hashi = (H_0 + i^2 )% m, 或者：hashi = (H_0 - i^2 )% m。其中：i = 1,2,3…， H_0是通过散列函数Hash(x)对元素的关键码 key 进行计算得到的位置，m是表的大小。

研究表明：当表的长度为质数且表装载因子a不超过0.5时，新的表项一定能够插入，而且任何一个位置都不会被探查两次。因此只要表中有一半的空位置，就不会存在表满的问题。在搜索时可以不考虑表装满的情况，但在插入时必须确保表的装载因子a不超过0.5，如果超出必须考虑增容。

因此：闭散列最大的缺陷就是空间利用率比较低，这也是哈希的缺陷，我们实际上不常以闭散列的方式实现哈希表，接下来介绍常用的开散列实现方式。

开散列：

开散列法又叫链地址法(开链法)，首先对关键码集合用散列函数计算散列地址，具有相同地址的关键码归于同一子集合，每一个子集合称为一个桶，各个桶中的元素通过一个单链表链接起来，各链表的头结点存储在哈希表中。

本质可以认为哈希表中存的是一个个的结点指针，以这些结点指针为头指针后面还链接这其它结点指针，从上图可以看出，开散列中每个桶中放的都是发生哈希冲突的元素。

开散列的模拟实现

这里仍然不考虑封装的问题，先实现功能，首先哈希桶整体结构如下：


namespace HashBucket
{
template<class K,class V>
struct HashNode
{
	HashNode<K,V>* _next;
	pair<K, V> _kv;
 
	HashNode(const pair<K, V>& kv)
		:_next(nullptr)
		, _kv(kv)
	{}
};
 
template<class K,class V>
class HashTable
{
	typedef HashNode<K,V> Node;
//...
//成员函数部分
//...
 
private:
	vector<Node*> _table;
	size_t _n=0;
};
 
}

析构函数：

这里与闭散列不同，需要自己实现析构函数，原因是闭散列的时候，存在表里的是一个个数据本身，而开散列存的则是一个个结点指针，关键是头结点指针后面可能还链接着其它结点指针，闭散列在程序结束后，默认的析构函数就够用了，而开散列调用默认析构函数指针释放各个桶的头结点指针，每个桶后面挂着的结点指针还在，造成内存泄露，代码如下：


~HashTable()
	{
		for (auto cur : _table)
		{
			while (cur)
			{
				Node* next = cur->next;
				delete cur;
				cur = next;
			}
			cur = nullptr;
		}
	}

Find（）查找函数：

无论是删除还是插入都需要用到查找函数，因此我们先来实现它，代码如下：


	Node* Find(const K& key)
	{
		if (_table.size() == 0)
			return nullptr;
 
		size_t hashi = key% _table.size();
		Node* cur = _table[hashi];
		while (cur)
		{
			if (cur->_kv.first == key)
			{
				return cur;
			}
 
			cur = cur->_next;
		}
 
		return nullptr;
	}

Insert()插入函数：

插入函数的的思想其实和闭散列的实现相类似，都是先判断表中有没有，再判断是否需要扩容，最后在插入到合适的位置，只是在扩容部分有些区别，开散列的扩容是可以在负载因子为1时扩容的，其次在闭散列时，我们推荐的是复用思想，即构造一个新的哈希表，就哈希表中元素依次插入新哈希表后，新旧表内容交换，但这里不推荐这种方法，原因是消耗太大，假若一共有N个数据在哈希表中，那么构造一个新的哈希桶就要构造再N个结点，析构N个结点，完全没必要，因此我们选择定义一个newtable,直接将_table中的结点指针按照规则插入链接即可，如下：


bool Insert(const pair<K, V>& kv)
	{
		if (find(kv.first))
		{
			return false;
		}
 
		if (_table.size() == _n)
		{
			size_t newsize = _table.size() == 0 ? 10 : _table.size() * 2;
			vector<Node*> newtable(newsize, nullptr);
			//for (Node*& cur : _table)
			for (auto& cur : _table)
			{
				while (cur)
				{
					Node* next = cur->_next;
					size_t hashi = cur->_kv.first% newtable.size();
 
					// 头插到新表
					cur->_next = newtable[hashi];
					newtable[hashi] = cur;
 
					cur = next;
				}
			}
			_table.swap(newtable);
		}
 
      size_t hashi = kv.first % _table.size();
	  Node* newnode = new Node(kv);
		newnode->_next = _table[hashi];
		_table[hashi] = newnode;
		_n++;
	return true;
	}

Erase（）删除函数：

开散列删除函数的实现就不想闭散列那么简单了，它没有状态的概念，只能实打实的删除，并且设计到指针的链接修改问题，实现如下，需要注意的点都以注释的形式标注了：


bool erase(const K& key)
{
	int ret = Finf(key);
	if (ret)//ret不为空，找到了，开始删除
	{
		size_t hashi = key % _table.size();
		Node* cur = _table[hashi];
		Node* prev = nullptr;
		while (cur)//从头结点开始往下找，知道碰到nullptr这个桶就走完了
		{
		  if (cur->_kv.first == key)
			{
			  if (prev == nullptr)//说明该节点为头结点，头结点直接更新为cur的下一个即可
				{
				  _table[hashi] = cur->_next;
				}
			  else//不是头结点，让prev即cur的前一个节点的_next指向cur的_next
				{
				  prev->_next = cur->_next;
				}
			delete cur;
			return true;
		    }
		  else
			{
			 prev = cur;
			 cur = cur->_next;
			}
	    }
   }
	return false;
}

测试结果如下：

综上，我们自己以开散列形式实现的哈希表基本功能已经实现，接下来就是要考虑细节。

代码完善

1.key值类型不确定，如何解决

我们现在实现的代码，只适用于key值为整型的数据类型，因为在定位hashi中都涉及到取模的运算，并不是所有类型都能隐式转换为整型再进行取模运算，比如说字符串类型，字符数组是无法自动转换为整型的，这个时候，我们采用的是设置仿函数，如下：


template<class K>
struct Hashfunc
{
	size_t operator()(const K& key)
	{
		return key;
	}
};
template<class K,class V,class Hash= Hashfunc<K>>
class HashTable
{
//..//
};

使用过程中，只需要在要进行取模运算之前定义一个Hash仿函数对象，使用时按照函数的方式使用即可，以Find()函数中的应用为例，如下：


Node* Find(const K& key)
	{
		if (_table.size() == 0)
			return nullptr;
		Hash hash;
		size_t hashi = hash(key)% _table.size();
		Node* cur = _table[hashi];
		while (cur)
		{
			if (cur->_kv.first == key)
			{
				return cur;
			}
 
			cur = cur->_next;
		}
 
		return nullptr;
	}

如果类似于字符串这样不能自动转为整型的类型就手动实现一个，使用时传自己实现的即可

注意：对于将字符串转换为整型存在以下问题

1.如果转换方式选择返回字符串首字符对应的ASCII值，那就会存在，首字符相同的值会被认为是相同的值，如："student"和"string";

2.如果转换方式选择返回字符串所有字符对应的ASCII值相加，那就会存在，字母组成相同只有顺序不同的会被认为是相同的值，如：“ate”和“eat”；

要想解决这个问题，我们就要找到尽可能转换结果不会重复的转换方式，为此专门有人研究为我们提供了字符串相关的哈希算法，实现方式很多，具体可以参考以下链接，我们这里只以其中一种常用的为例：

https://www.cnblogs.com/-clq/archive/2012/05/31/2528153.htmlhttps://www.cnblogs.com/-clq/archive/2012/05/31/2528153.html我们选择的方式是每加一个字符，就给结果*31，实现如下：
struct HashStr
	{
		// BKDR，该算法的名称，是发明的两名作者名字首字母的缩写组合
		size_t operator()(const string& s)
		{
			size_t hash = 0;
			for (auto ch : s)
			{
				hash += ch;
				hash *= 31;
			}
 
			return hash;
		}
	};
测试结果如下：

由于字符串类型经常作为key值被使用，每次用的时候都要传一次未免过于麻烦，而且之前测试使用unordered_map时发现也是不用自己传仿函数的，这是因为库里针对key值为字符串类型的情况对仿函数模板进行了特化，我们也来特化一下：


	template<class K>
	struct Hashfunc
	{
		size_t operator()(const K& key)
		{
			return key;
		}
	};
	// 特化
	template<>
	struct Hashfunc<string>
	{
		// BKDR
		size_t operator()(const string& s)
		{
			size_t hash = 0;
			for (auto ch : s)
			{
				hash += ch;
				hash *= 31;
			}
 
			return hash;
		}
	};

这样我们使用的时候，以字符串类型做key值类型的时候就不用自己传了，编译器会自动识别匹配。

2.除留余数法，最好模一个素数，如何每次快速取一个类似两倍关系的素数？

具体原因其实没有找到特别权威准确的描述，但有人提出来，可能是做了测试，认为素数造成的碰撞会少一些，我们就参考SGI库里的理解一下即可，代码如下：


size_t GetNextPrime(size_t prime)
{
	// SGI
	static const int __stl_num_primes = 28;
	static const unsigned long __stl_prime_list[__stl_num_primes] =
	{
		53, 97, 193, 389, 769,
	    1543, 3079, 6151, 12289, 24593,
		49157, 98317, 196613, 393241, 786433,
		1572869, 3145739, 6291469, 12582917, 25165843,
		50331653, 100663319, 201326611, 402653189, 805306457,
		1610612741, 3221225473, 4294967291
	};
 
	size_t i = 0;
	for (; i < __stl_num_primes; ++i)
	{
		if (__stl_prime_list[i] > prime)
			return __stl_prime_list[i];
	}
 
	return __stl_prime_list[i];
}
 
//使用方法： size_t newsize = GetNextPrime(_tables.size());

总的来说就是初始容量为53开始，每次扩容都是上一次容量2倍附近的一个值，这些数都是无符号长整型类型的数，直至最大无符号长整型为止【事实上根本不可能到这么大，从内存角度理解】。

3.性能优化

理论上，最坏情况下，哈希表的查找效率是O（N），这种情况就是所有的数据都挂在一个桶上，但实际上这种最坏的情况几乎不会发生，因为我们有上述负载因子的控制，哈希表每个桶的长度不会过长，以一组随机数为例来测试：


size_t MaxBucketSize()
{
	size_t max = 0;
	for (size_t i = 0; i < _table.size(); ++i)
	{
		auto cur = _table[i];
		size_t size = 0;
		while (cur)
		{
			++size;
			cur = cur->_next;
		}
 
		//printf("[%d]->%d\n", i, size);
		if (size > max)
		{
			max = size;
		}
	}
	return max;
}

测试结果如下：

插入9万个数最多数据的桶才2，足可见，效率其实可以几乎达到O(1)

即使几乎不可能达到最坏的情况，但不排除会有人故意难为你，问你如何解决单个桶元素过多的情况，这里提供一种思想，了解一下就好，不需要去实现，我们可以给设置一个标准值，当一个桶链接的节点个数超过标准值是，就不在一个个的按照原来的方式往下链接，而是转换为红黑树插入，这样就能够解决每个桶链接数据量过大导致效率低下的问题。

模拟实现

基本框架的搭建：

首先要按照封装map和set的方式改造哈希表结构，先搭架子。


//HashTable.h
 
#pragma once
#include<vector>
#include<string>
template<class T>
struct HashNode
{
	HashNode<T>* _next;
	T _data;
 
	HashNode(const T& data)
		:_next(nullptr)
		, _data(data)
	{}
};
 
template<class K>
struct Hashfunc
{
	size_t operator()(const K& key)
	{
		return key;
	}
};
// 特化
template<>
struct Hashfunc<string>
{
	// BKDR
	size_t operator()(const string& s)
	{
		size_t hash = 0;
		for (auto ch : s)
		{
			hash += ch;
			hash *= 31;
		}
 
		return hash;
	}
};
namespace HashBucket
{
	
	template<class K, class T,class KeyOfT, class Hash = Hashfunc<K>>
	class HashTable
	{
		typedef HashNode<T> Node;
	public:
		~HashTable()
		{
			for (auto cur : _table)
			{
				while (cur)
				{
					Node* next = cur->_next;
					delete cur;
					cur = next;
				}
				cur = nullptr;
			}
		}
		Node* Find(const K& key)
		{
			if (_table.size() == 0)
				return nullptr;
			KeyOfT kot;
			Hash hash;
			size_t hashi = hash(key) % _table.size();
			Node* cur = _table[hashi];
			while (cur)
			{
				if (kot(cur->_data) == key)
				{
					return cur;
				}
 
				cur = cur->_next;
			}
 
			return nullptr;
		}
		size_t GetNextPrime(size_t prime)
		{
			// SGI
			static const int __stl_num_primes = 28;
			static const unsigned long __stl_prime_list[__stl_num_primes] =
			{
				53, 97, 193, 389, 769,
				1543, 3079, 6151, 12289, 24593,
				49157, 98317, 196613, 393241, 786433,
				1572869, 3145739, 6291469, 12582917, 25165843,
				50331653, 100663319, 201326611, 402653189, 805306457,
				1610612741, 3221225473, 4294967291
			};
 
			size_t i = 0;
			for (; i < __stl_num_primes; ++i)
			{
				if (__stl_prime_list[i] > prime)
					return __stl_prime_list[i];
			}
 
			return __stl_prime_list[i];
		}
 
		bool Insert(const T& data)
		{
			KeyOfT kot;
		//	if (Find(kot(data)){return false;}
 
			if (_table.size() == _n)
			{
				//size_t newsize = _table.size() == 0 ? 10 : _table.size() * 2;
				size_t newsize = GetNextPrime(_table.size());
				vector<Node*> newtable(newsize, nullptr);
				//for (Node*& cur : _table)
				for (auto& cur : _table)
				{
					while (cur)
					{
						Node* next = cur->_next;
						Hash hash;
						KeyOfT kot;
						size_t hashi = hash(kot(cur->_data)) % newtable.size();
 
						// 头插到新表
						cur->_next = newtable[hashi];
						newtable[hashi] = cur;
 
						cur = next;
					}
				}
				_table.swap(newtable);
			}
			Hash hash;
			size_t hashi = hash(kot(data)) % _table.size();
			Node* newnode = new Node(data);
			// 头插到新表
			newnode->_next = _table[hashi];
			_table[hashi] = newnode;
			_n++;
			return true;
 
		}
		
		bool erase(const K& key)
		{
			auto ret = Find(key);
			if (ret)//ret不为空，找到了，开始删除
			{
				Hash hash;
				KeyOfT kot;
				size_t hashi = hash(key) % _table.size();
				Node* cur = _table[hashi];
				Node* prev = nullptr;
				while (cur)//从头结点开始往下找，知道碰到nullptr这个桶就走完了
				{
					if (kot(cur->_data) == key)
					{
						if (prev == nullptr)//说明该节点为头结点，头结点直接更新为cur的下一个即可
						{
							_table[hashi] = cur->_next;
						}
						else//不是头结点，让prev即cur的前一个节点的_next指向cur的_next
						{
							prev->_next = cur->_next;
						}
						delete cur;
						_n--;
						return true;
					}
					else
					{
						prev = cur;
						cur = cur->_next;
					}
				}
			}
			return false;
		}
		size_t MaxBucketSize()
		{
			size_t max = 0;
			for (size_t i = 0; i < _table.size(); ++i)
			{
				auto cur = _table[i];
				size_t size = 0;
				while (cur)
				{
					++size;
					cur = cur->_next;
				}
 
				//printf("[%d]->%d\n", i, size);
				if (size > max)
				{
					max = size;
				}
			}
 
			return max;
		}
	private:
		vector<Node*> _table;
		size_t _n = 0;
	};
 
 
}


//unordered_map
 
 
#pragma once
#include "HashTable.h"
namespace wz
{
	template<class K,class V,class Hash= Hashfunc<K>>
	class unordered_map
	{
	public:
		struct MapKeyofT
		{
			const K& operator()(const pair<K, V>& kv)
			{
				return kv.first;
			}
		};
		bool Insert(const pair<const K,V>& kv)
		{
			return _ht.Insert(kv);
		}
		bool Find(const K& key)
		{
			return _ht.Find(key);
		}
		bool Erase(const K& key)
		{
			return _ht.erase(key);
		}
	private:
		HashBucket::HashTable<K, pair<const K, V>, MapKeyofT, Hash> _ht;
	};
	void test_unordered_map1()
	{
		unordered_map<int, int> m;
		m.Insert(make_pair(1, 1));
		m.Insert(make_pair(2, 2));
		m.Insert(make_pair(3, 3));
 
	}
}
 
 
//unordered_set.h
 
 
#pragma once
#include "HashTable.h"
namespace wz
{
	template<class K, class Hash = Hashfunc<K>>
	class unordered_set
	{
	public:
		struct SetKeyofT
		{
			const K& operator()(const K& key)
			{
				return key;
			}
		};
		bool Insert(const K& key)
		{
			return _ht.Insert(key);
		}
		bool Find(const K& key)
		{
			return _ht.Find(key);
		}
		bool Erase(const K& key)
		{
			return _ht.erase(key);
		}
	private:
		HashBucket::HashTable<K, K, SetKeyofT, Hash> _ht;
	};
	void test_unordered_set1()
	{
		unordered_set<int> s;
		s.Insert(1);
		s.Insert(2);
		s.Insert(3);
 
 
	}
}

以上完成了哈希表基本结构的改造和unordered_map和unordered_set基本框架的编写及测试，测试结果都显示是没有什么问题的，实现方式和用红黑树封装map和set大同小异，接下来的重点依然是如何实现哈希表的迭代器进而如何封装出unordered_map和unordered_set的迭代器。

迭代器的实现：

哈希表的迭代器实现如下：


//因为要在里面定义哈希表，因此最好做前置声明
	template<class K, class T, class KeyOfT, class Hash>
	class HashTable;
 
	//下面正式开始实现迭代器
	template<class K,class T,class Ref,class Ptr,class KeyOfT,class Hash>
	struct __HashIterator
	{
		typedef HashNode<T> Node;
		typedef HashTable<K, T, KeyOfT, Hash> HT;
		typedef __HashIterator<K, T, Ref, Ptr, KeyOfT, Hash> Self;
		typedef __HashIterator<K, T, T&, T*, KeyOfT, Hash> Iterator;
 
		//模拟遍历哈希表的行为，思考作为哈希表的迭代器需要的成员变量
		Node* _node;//当前结点指针
		const HT* _ht;//当前所在哈希表
		__HashIterator(Node* node, const HT* ht)
			:_node(node)
			, _ht(ht)
		{}
 
		__HashIterator(const Iterator& it)
			:_node(it._node)
			, _ht(it._ht)
		{}
		Ref operator*()
		{
			return _node->_data;
		}
 
		Ptr operator->()
		{
			return &_node->_data;
		}
 
		bool operator!=(const Self& s)
		{
			return _node != s._node;
		}
		Self& operator++()
		{
			if (_node->_next != nullptr)
			{
				_node = _node->_next;
			}
			else
			{
				// 找下一个不为空的桶
				KeyOfT kot;
				Hash hash;
				// 算出我当前的桶位置
				size_t hashi = hash(kot(_node->_data)) % _ht->_table.size();
				++hashi;
				while (hashi < _ht->_table.size())
				{
					if (_ht->_table[hashi])
					{
						_node = _ht->_table[hashi];
						break;
					}
					else
					{
						++hashi;
					}
				}
 
				// 没有找到不为空的桶
				if (hashi == _ht->_table.size())
				{
					_node = nullptr;
				}
			}
 
			return *this;
		}
	};

在HashTable中定义begin()、end()、const_begin()、const_end(),如下：


//友元声明，因为需要访问到迭代器的成员
template<class K, class T, class Ref, class Ptr, class KeyOfT, class Hash>
friend struct __HashIterator;
 
typedef __HashIterator<K, T, T&, T*, KeyOfT, Hash> iterator;
typedef __HashIterator<K, T, const T&, const T*, KeyOfT, Hash> const_iterator;
		iterator begin()
		{
			Node* cur = nullptr;
			for (size_t i = 0; i < _table.size(); ++i)
			{
				cur = _table[i];
				if (cur)
				{
					break;
				}
			}
 
			return iterator(cur, this);
		}
 
		iterator end()
		{
			return iterator(nullptr, this);
		}
 
		const_iterator begin() const
		{
			Node* cur = nullptr;
			for (size_t i = 0; i < _table.size(); ++i)
			{
				cur = _table[i];
				if (cur)
				{
					break;
				}
			}
 
			return const_iterator(cur, this);
		}
 
		const_iterator end() const
		{
			return const_iterator(nullptr, this);
		}

unordered_map和unordered_set的迭代器封装哈希表中的迭代器如下：


//unordered_map.h
 
 
typedef typename HashBucket::HashTable<K, pair<const K, V>, MapKeyofT, Hash>::iterator iterator;
 
typedef typename HashBucket::HashTable<K, pair<const K, V>, MapKeyofT, Hash>::const_iterator const_iterator;
 
		iterator begin()
		{
			return _ht.begin();
		}
 
		iterator end()
		{
			return _ht.end();
		}
 
		const_iterator begin() const
		{
			return _ht.begin();
		}
 
		const_iterator end() const
		{
			return _ht.end();
}
 
 
//unordered_set.h
 
 
 
typedef typename HashBucket::HashTable<K, K, SetKeyofT, Hash>::const_iterator iterator;
typedef typename HashBucket::HashTable<K, K, SetKeyofT, Hash>::const_iterator const_iterator;
 
 
		iterator begin()
		{
			return _ht.begin();
		}
 
		iterator end()
		{
			return _ht.end();
		}
 
		const_iterator begin() const
		{
			return _ht.begin();
		}
 
		const_iterator end() const
		{
			return _ht.end();
		}

测试基本功能如下，可见目前我们的实现没有问题：

进一步完善：

到这一步，我们对unordered_set的封装就已经差不多了，unordered_map还差一个重要的部分就是实现[ ],类比map和set的[ ]分析与实现，我们同样可以快速完成，与此同时要做一件事情就是将所有Insert()的返回值类型都修改为pair<iterator,bool>类型


//unordered_map.h
 
 
V& operator[](const K& key)
{
	pair<iterator, bool> ret = Insert(make_pair(key, V()));
	return ret.first->second;
}
 
pair<iterator, bool> Insert(const pair<const K,V>& kv)
{
	return _ht.Insert(kv);
}
 
 
//unordered_set.h
 
 
pair<iterator, bool> Insert(const K& key)
{
	return _ht.Insert(key);
}
 
 
//HashTable.h
 
 
pair<iterator, bool> Insert(const T& data)
{
	KeyOfT kot;
	iterator it = Find(kot(data));
	if (it != end())
	{
		return make_pair(it, false);
	}
 
 
	if (_table.size() == _n)
	{
		//size_t newsize = _table.size() == 0 ? 10 : _table.size() * 2;
		size_t newsize = GetNextPrime(_table.size());
		vector<Node*> newtable(newsize, nullptr);
		//for (Node*& cur : _table)
		for (auto& cur : _table)
		{
			while (cur)
			{
				Node* next = cur->_next;
				Hash hash;
				KeyOfT kot;
				size_t hashi = hash(kot(cur->_data)) % newtable.size();
 
				// 头插到新表
				cur->_next = newtable[hashi];
				newtable[hashi] = cur;
					cur = next;
			}
		}
		_table.swap(newtable);
	}
	Hash hash;
	size_t hashi = hash(kot(data)) % _table.size();
	Node* newnode = new Node(data);
	// 头插到新表
	newnode->_next = _table[hashi];
	_table[hashi] = newnode;
	_n++;
	return make_pair(iterator(newnode, this), true);
 
}

修改Find()返回值类型为iterator,记得所有需要用到Find返回值的都需要修改：


//unordered_map.h
 
 
iterator Find(const K& key)
{
	return _ht.Find(key);
}
 
 
//unordered_map.h
 
 
iterator Find(const K& key)
{
	return _ht.Find(key);
}
 
 
//HashTable.h
 
 
iterator Find(const K& key)
{	
    if (_table.size() == 0)
		return end();
	KeyOfT kot;
	Hash hash;
	size_t hashi = hash(key) % _table.size();
	Node* cur = _table[hashi];
	while (cur)
	{
		if (kot(cur->_data) == key)
		{
			return iterator(cur, this);;
		}
			cur = cur->_next;
	}
	return end();
}

综上，我们的代码基本完善，我来来用自定义类型作为key的方式来测试一下，这里就用我们之前实现过的日期类的基本代码即可，日期类如下：


	class Date
	{
		friend struct HashDate;
	public:
		Date(int year = 1900, int month = 1, int day = 1)
			: _year(year)
			, _month(month)
			, _day(day)
		{}
 
		bool operator<(const Date& d)const
		{
			return (_year < d._year) ||
				(_year == d._year && _month < d._month) ||
				(_year == d._year && _month == d._month && _day < d._day);
		}
 
		bool operator>(const Date& d)const
		{
			return (_year > d._year) ||
				(_year == d._year && _month > d._month) ||
				(_year == d._year && _month == d._month && _day > d._day);
		}
 
		bool operator==(const Date& d) const
		{
			return _year == d._year
				&& _month == d._month
				&& _day == d._day;
		}
 
		friend ostream& operator<<(ostream& _cout, const Date& d);
	private:
		int _year;
		int _month;
		int _day;
	};
 
	ostream& operator<<(ostream& _cout, const Date& d)
	{
		_cout << d._year << "-" << d._month << "-" << d._day;
		return _cout;
	}

作为key值的话必须能支持转换成整型，才能参与哈希底层的取模运算，日期类不支持自行转换，因此我们自己实现HashDate，实现如下，需要注意的是，在这过程中，由于需要访问日期类内部私有成员变量，因此可以将其在日期类中声明为友元类，HashDate实现代码如下：


struct HashDate
	{
		size_t operator()(const Date& d)
		{
			//这里要访问私有成员，因此将其在日期类里声明为友元类
			size_t hash = 0;
			hash += d._year;
			hash *= 31;
 
			hash += d._month;
			hash *= 31;
 
			hash += d._day;
			hash *= 31;
 
			return hash;
		}
	};

做一个简单的测试，测试代码和结果如下，说明我们封装的unordered_map基本上没什么问题：

综上，我关于以哈希为底层的unordered_set和unordered_map总结的就差不多了，内容很多，细节也不少，多看几遍，多敲代码，以便加深理解！

本课涉及到的所有代码都见以下链接，欢迎参考指正！

practice: 课程代码练习 - Gitee.comhttps://gitee.com/ace-zhe/practice/tree/master/Hash

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/繁依Fanyi0/article/detail/563605