文章目录

unordered 系列容器
哈希表的概念
哈希函数
- 哈希函数设计原则
- 常用哈希函数
哈希冲突
实现：除留余数法，闭散列
- 框架
- 查找
- 插入与扩容
- 删除
- 仿函数
- 完整代码
实现：除留余数法、开散列
封装成 unordered_map、unordered_set
unordered_map其他底层接口

unordered 系列容器

C++ 中，哈希表实现的容器为 unordered_map 和 unordered_set

和 set、map 的区别：

无序
只有单向迭代器
效率高，查找的时间复杂度为 $O (1)$

之所以 set 和 map 的查找速度没有这么快，是因为它们的底层是红黑树，查找元素时需要从根结点开始比较，每比较一次可以排除一半的元素，时间复杂度是 $O(\log n)$ ，当然这已经比遍历查找 $O (n)$ 快很多了。那么，有没有一种方法，可以一下子找到要查找的元素呢？

当然是有的，最简单的就是数组的下标查找，比如开辟一个大小 26 的数组，让 26 个英文字母对应 0~25 的下标，这样我们就可以对一段英文序列统计每个字母的出现次数。最后通过下标查找，我们就可以快速获得一个字母的出现次数。

像这样，字母(key)，存储位置(下标)，数组里存的数值(vlaue)，这三者之间就存在了一种一一对应（映射）关系。

最主要的是字母和存储位置之间建立的映射关系，在这里很合适，因为字母有 ASCII 码，并且是连续的

而在下列情景下可能就不行了：

数据范围分布很广，不集中，那么你的数组就要开得很大，浪费的空间也很多。
key 的类型不是整数，而是字符串，自定义类型等等。

哈希表的概念

为了解决这些问题，我们需要某种函数（hashFunc）使 key 和存储位置建立映射关系，那么在查找时通过该函数可以很快找到该元素。

当向结构中：

插入元素

根据待插入元素的 key ，以哈希函数计算该元素的存储位置并存放。
搜索元素

对元素的 key 进行同样的计算，把求得的函数值当做元素的存储位置，在结构中按此位置取元素比较，若 key 相等，则搜索成功。

该方式即为哈希(散列)方法，哈希方法中使用的转换函数称为哈希(散列)函数，构造出来的结构称为哈希表(Hash Table)(或者称散列表)

哈希函数

哈希的关键就是 key 与地址的映射

哈希函数设计原则

哈希函数的定义域必须包括需要存储的全部关键码，而如果散列表允许有m个地址时，其值域必须在0到m-1之间
哈希函数计算出来的地址能均匀分布在整个空间中
哈希函数应该比较简单

常用哈希函数

直接定址法

取关键字的某个线性函数为散列地址：$\rm Hash(Key)= A\times key + B $
除留余数法

设散列表中允许的地址数为 m，取一个不大于 m，但最接近或者等于 m 的质数 p 作为除数，按照哈希函数： $\rm Hash(key) = key\ \%\ p(p<=m)$ ，将关键码转换成哈希地址。使用质数可以减少哈希冲突，库里面打了一个质数表来实现。

哈希冲突

概念

不同 key 通过相同哈希函数计算出相同的哈希地址，该种现象称为哈希冲突或哈希碰撞

具有不同 key 而具有相同哈希地址的数据元素称为 同义词

例：

设置哈希函数为：hash(key) = key % capacity，capacity 即为结构的容量。

假设哈希表容量为 10 ，我们将 5 8 99999 20 100 这些数存进去：
$\begin{array}{} \begin{array}{} \ 0\ &1&2&3&4&5&6&7&8&\ \ \ \ 9\ \ \ \ \end{array} \\ \begin{array}{|c|c|c|c|c|c|c|c|c|c|} \hline \color{red}20&\ \ &\ \ &\ \ &\ \ &\color{red}5&\ \ &\ \ &\color{red}8&\color{red}99999 \\ \hline \end{array} \end{array}$
hash(20) = 0，hash(100) = 0，这两个元素要存的位置一样，而我们总不可能一个位置存放两个元素，那这个问题怎么解决呢？

解决方式

哈希冲突总归是要影响效率的，我们只能尽可能减小这个影响。

闭散列

闭散列又叫开放定址法，当发生哈希冲突时，如果哈希表未被装满，那么就把 key 存放到冲突位置的下一个空位置中去。

那么如何寻找下一个空位置呢？

线性探测

还是上面那个数组，如果还要插入10 30 50，虽然它们的哈希地址都是 0，但是我们可以直接在后面找空位置存放

即新元素存在 $\rm{(hash(key)+i)\;\%\;N,(i=1,2,3,\cdots,N为结构容量})$ 的空位置
$\begin{array}{} \begin{array}{} \ 0\ &\ 1\ &\ 2\ &\ 3\ &4&5&6&7&8&\ \ \ \ 9\ \ \ \ \end{array} \\ \begin{array}{|c|c|c|c|c|c|c|c|c|c|} \hline \color{red}20&\color{blue}10&\color{blue}30&\color{blue}50&\ \ &\color{red}5&\ \ &\ \ &\color{red}8&\color{red}99999 \\ \hline \end{array} \end{array}$
缺点：拥堵

二次探测

发生哈希冲突时，新元素存在 $\rm{(hash(key)+i^2)\;\%\;N,(i=1,2,3,\cdots,N为结构容量})$ 的空位置

10 就存在 $0+1^2=1$ 的位置，30 存在 $0+2^2=4$ 的位置，由于 $0+3^2=9$ 已经被占用了，50 只能存在 $0+4^2)\;\%\;10=6$ 的位置。
$\begin{array}{} \begin{array}{} \ 0\ &\ 1\ &2&3&\ 4\ &5&\ 6\ &7&8&\ \ \ \ 9\ \ \ \ \end{array} \\ \begin{array}{|c|c|c|c|c|c|c|c|c|c|} \hline \color{red}20&\color{blue}10&\ \ &\ \ &\color{blue}30&\color{red}5&\color{blue}50&\ \ &\color{red}8&\color{red}99999 \\ \hline \end{array} \end{array}$
这是对线性探测的优化，相对来说没那么拥堵，但是没有改变占别人位置的本质。

开散列

开散列又叫链地址法/开链法/哈希桶

哈希表中存指针，指针指向一个链表，把哈希冲突的数据存在同一串链表中。
$\begin{array}{} \begin{array}{} 0\\1\\2\\3\\4\\5\\6\\7\\8\\9 \end{array} & \begin{array}{|c|} \hline \rm{NULL}\\ \rm{-->}\\ \rm{NULL}\\ \rm{NULL}\\ \rm{-->}\\ \rm{-->}\\ \rm{-->}\\ \rm{-->}\\ \rm{NULL}\\ \rm{-->}\\ \hline \end{array} & \begin{array}{l} \\ \rm{1}-->NULL\\ \\ \\ \rm{44}-->4-->NULL\\ \rm{5}-->NULL\\ \rm{6}-->NULL\\ \rm{7}-->NULL\\ \\ \rm{9}-->NULL\\ \end{array} \end{array}$

实现：除留余数法，闭散列

框架

用顺序表实现一个简易的哈希表，这里我们直接用 vector 容器

enum State
{
	EMPTY,
	EXITS,
	DELETE
};

template<class K, class V>
struct HashData
{
	pair<K, V> _kv;
	State _state = EMPTY;
};

template<class K, class V>
class HashTable
{
    typedef HashData<K, V> Data;
public:
    
private:
	vector<Data> _tables;
    size_t _n = 0; // 有效数据的个数
};

$\begin{array}{} \begin{array}{} \ 0\ &\ 1\ &\ 2\ &\ 3\ &4&5&6&7&8&\ \ \ \ 9\ \ \ \ \end{array} \\ \begin{array}{|c|c|c|c|c|c|c|c|c|c|} \hline \color{red}20&\color{blue}10&\color{blue}30&\color{blue}50&\ \ &\color{red}5&\ \ &\ \ &\color{red}8&\color{red}99999 \\ \hline \end{array} \end{array}$

如何查找一个元素？

一个办法：先算出元素的哈希地址，然后依次往后找，直到找到或者遇到空位置为止
- 如要查找 50，先算出它的哈希地址为0，然后往后找，直到找到下标 3 位置的 50
- 如要查找 60，先算出它的哈希地址为0，然后往后找，直到空位置都没有找到，所以不存在60

这个方法可行吗？

如果先删除10，再找50，那么找到下标1位置就结束了，最后没找到，显然不行。

所以我们给了一个 HashData 类模板，用来表示哈希表里的数据类型，其中包含了一个 pair 和一个 State， State 是枚举类型，用来记录位置的状态。删除元素时把这个位置设为 DELETE 状态，查找时遇到 DELETE 状态就继续找，这个问题就解决了。

查找

Data* Find(const K& key)
{
    if (_tables.size() == 0)
    {
        return nullptr;
    }

    size_t starti = key;
    starti %= _tables.size();
    size_t hashi = starti;
    // 线性探测/二次探测
    size_t i = 1;
    while (_tables[hashi]._state != EMPTY)
    {
        if (_tables[hashi]._state != DELETE && _tables[hashi]._kv.first == key)
        {
            return &_tables[hashi];
        }
        hashi = (starti + i) % _tables.size();
        ++i;
    }
    return nullptr;
}

细节：

表为空，应该直接 return nullptr 否则下面会出现除数为0的错误
如果只比较 key ，而不判断状态是否是 DELETE，那么用户很可能会把已经删除的元素查找出来。

插入与扩容

插入之前想一想，哈希表什么情况下进行扩容，如何扩容？

载荷因子： $\alpha=填入表中的元素个数/哈希表长度$
$\alpha$ 表示哈希表的装满程度， $\alpha$ 越大，产生冲突的可能性越大。
对于闭散列，载荷因子是特别重要的因素，应严格限制在 0.7~0.8 以下，超过 0.8，查表时的 CPU 缓存不命中按照指数曲线上升。

我们设定负载因子到 0.7 以上就扩容

bool Insert(const pair<K, V>& kv)
{
    if (Find(kv.first))
    {
        return false;
    }
    // 载荷因子到0.7以上，扩容
    if (_tables.size() == 0 || _n * 10 / _tables.size() >= 7)
    {
        size_t newSize = _tables.size() == 0 ? 10 : 2 * _tables.size();
        // 映射关系变了，需要重新映射
        // 创建新表，开好新空间，将旧表的元素插入，最后swap
        HashTable<K, V> newHT;
        newHT._tables.resize(newSize);

        for (auto& e : _tables)
        {
            if (e._state == EXITS)
            {
                newHT.Insert(e._kv);
            }
        }
        newHT._tables.swap(_tables);
    }
    size_t starti = kv.first;
    starti %= _tables.size();
    size_t hashi = starti;
    // 线性探测/二次探测
    size_t i = 1;
    while (_tables[hashi]._state == EXITS)
    {
        hashi = (starti + i) % _tables.size();
        ++i;
    }
    _tables[hashi]._kv = kv;
    _tables[hashi]._state = EXITS;
    ++_n;

    return true;
}

扩容后，映射关系变了，需要重新映射，这里采用了类似深拷贝现代写法的写法，创建新的哈希表，重新插入一遍，然后旧表与新表交换。

删除

找到要删除的元素，状态该成 DELETE

bool Erase(const K& key)
{
    Data* ret = Find(key);
    if (ret)
    {
        ret->_state = DELETE;
        --_n;
        return true;
    }
    else
    {
        return false;
    }
}

仿函数

现在解决了数据范围过大，不集中的问题。还有一个问题，非整数类型，如何映射哈希地址？

为此，我们需要一个仿函数 HashFunc，添加一个模板参数 HashFunc，在需要将 key 转成整数的地方使用仿函数。

以下是仿函数的实现：

对于能直接转换为整型的类型，调用 DefaultHash 仿函数，
对于 string 类型进行模板特化

要求 string 类型的哈希值，如果只是简单的把每个字符的 ASCII 码相加，那么造成的哈希冲突较多，比如相同字符组合，不同排列的字符串算出的哈希值是相同的。

为了解决这个问题，这里使用一个字符串哈希算法——BKDR Hash：

在求和的基础上，每次加一个字符前先乘 131，也可以乘1313,13131,131313…

template<class K>
struct DefaultHash
{
	size_t operator()(const K& key)
	{
		return key;
	}
};

template<>
struct DefaultHash<string>
{
	size_t operator()(const string& key)
	{
		size_t hash = 0;
		for (auto ch : key)
		{
			hash = hash * 131 + ch;
		}
		return hash;
	}
};

如果是自定义类型，那就需要我们自己写额外的仿函数哈希算法了，比如日期类可以把年月日加起来作为哈希值。

完整代码

#pragma once
#include <iostream>
#include <vector>
using namespace std;

enum State
{
	EMPTY,
	EXITS,
	DELETE
};

template<class K, class V>
struct HashData
{
	pair<K, V> _kv;
	State _state = EMPTY;
};

template<class K>
struct DefaultHash
{
	size_t operator()(const K& key)
	{
		return key;
	}
};

template<>
struct DefaultHash<string>
{
	size_t operator()(const string& key)
	{
		size_t hash = 0;
		for (auto ch : key)
		{
			hash = hash * 131 + ch;
		}
		return hash;
	}
};

template<class K, class V, class HashFunc = DefaultHash<K>>
class HashTable
{
	typedef HashData<K, V> Data;
public:
	bool Insert(const pair<K, V>& kv)
	{
		if (Find(kv.first))
		{
			return false;
		}
		// 载荷因子到0.7以上，扩容
		if (_tables.size() == 0 || _n * 10 / _tables.size() >= 7)
		{
			size_t newSize = _tables.size() == 0 ? 10 : 2 * _tables.size();
			// 映射关系变了，需要重新映射
			// 创建新表，开好新空间，将旧表的元素插入，最后swap
			HashTable<K, V, HashFunc> newHT;
			newHT._tables.resize(newSize);

			for (auto& e : _tables)
			{
				if (e._state == EXITS)
				{
					newHT.Insert(e._kv);
				}
			}
			newHT._tables.swap(_tables);
		}

		HashFunc hf;
		size_t starti = hf(kv.first);
		starti %= _tables.size();
		size_t hashi = starti;
		// 线性探测/二次探测
		size_t i = 1;
		while (_tables[hashi]._state == EXITS)
		{
			hashi = (starti + i) % _tables.size();
			++i;
		}
		_tables[hashi]._kv = kv;
		_tables[hashi]._state = EXITS;
		++_n;

		return true;
	}
	
	Data* Find(const K& key)
	{
		if (_tables.size() == 0)
		{
			return nullptr;
		}

		HashFunc hf;
		size_t starti = hf(key);
		starti %= _tables.size();
		size_t hashi = starti;
		// 线性探测/二次探测
		size_t i = 1;
		while (_tables[hashi]._state != EMPTY)
		{
			if (_tables[hashi]._state != DELETE && _tables[hashi]._kv.first == key)
			{
				return &_tables[hashi];
			}
			hashi = (starti + i) % _tables.size();
			++i;
		}
		return nullptr;
	}

	bool Erase(const K& key)
	{
		Data* ret = Find(key);
		if (ret)
		{
			ret->_state = DELETE;
			--_n;
			return true;
		}
		else
		{
			return false;
		}
	}

private:
	vector<Data> _tables;
	size_t _n = 0; // 有效数据的个数
};

实现：除留余数法、开散列

框架

template<class K, class V>
struct HashNode
{
	pair<K, V> _kv;
	HashNode<K, V>* _next;

	HashNode(const pair<K, V>& kv)
		: _kv(kv)
		, _next(nullptr)
	{}
};

template<class K, class V, class HashFunc = DefaultHash<K>>
class HashTable
{
	typedef HashNode<K, V> Node;
public:

private:
	// 指针数组
	vector<Node*> _tables;
	size_t _n = 0;
};

插入与扩容

开散列对载荷因子的要求没有那么高，我们这里设为载荷因子达到 1 以上时扩容，扩容的方式和闭散列的差不多（将旧表所以元素遍历出来插入开好空间的新表，最后新表旧表互换）。

bool Insert(const pair<K, V>& kv)
{
    if (Find(kv.first))
    {
        return false;
    }
    HashFunc hf;
    // 载荷因子 == 1，扩容，重新映射
    if (_tables.size() == _n)
    {
        size_t newSize = _tables.size() == 0 ? 10 : 2 * _tables.size();
        HashTable<K, V> newHT;
        newHT._tables.resize(newSize);
        for (size_t i = 0; i < _tables.size(); ++i)
        {
            Node* cur = _tables[i];
            while (cur)
            {
                newHT.Insert(cur->_kv);
                cur = cur->_next;
            }
        }
        newHT._tables.swap(_tables);
    }

    size_t hashi = hf(kv.first);
    hashi %= _tables.size();
    //头插，然后把头结点指针塞进桶里面
    Node* newNode = new Node(kv);
    newNode->_next = _tables[hashi];
    _tables[hashi] = newNode;

    ++_n;
    return true;
}

其实上面这种扩容方式不是很好，使用 insert 把数据插入到新表，就需要创建新的结点，完了还需要释放旧表，效率不高。那么，我们能不能直接把结点放到新表呢？

思路：创建新的 vector，而不是 HashTable，将结点头插进去。

    // 载荷因子 == 1，扩容，重新映射
    if (_tables.size() == _n)
    {
        size_t newSize = _tables.size() == 0 ? 10 : 2 * _tables.size();
        vector<Node*> newTable;
        newTable.resize(newSize);
        for (size_t i = 0; i < _tables.size(); ++i)
        {
            Node* cur = _tables[i];
            while (cur)
            {
                Node* next = cur->_next;

                size_t hashi = hf(cur->_kv.first) % newSize;
                cur->_next = newTable[hashi];
                newTable[hashi] = cur;

                cur = next;
            }
            _tables[i] = nullptr;
        }
        newTable.swap(_tables);
    }

析构

链表需要我们手动写析构函数来释放

~HashTable()
{
    for (size_t i = 0; i < _tables.size(); ++i)
    {
        Node* cur = _tables[i];
        while (cur)
        {
            Node* next = cur->_next;
            delete cur;
            cur = next;
        }
        _tables[i] = nullptr;
    }
}

查找

找到对应的桶，然后直接遍历里面的结点即可

Node* Find(const K& key)
{
    if (_tables.size() == 0)
    {
        return nullptr;
    }
    
	HashFunc hf;
    size_t hashi = hf(key);
    hashi %= _tables.size();
    Node* cur = _tables[hashi];
    while (cur)
    {
        if (cur->_kv.first == key)
        {
            return cur;
        }
        cur = cur->_next;
    }
    return nullptr;
}

删除

bool Erase(const K& key)
{
    if (_tables.size() == 0)
    {
        return false;
    }
    
	HashFunc hf;
    size_t hashi = hf(key);
    hashi %= _tables.size();
    Node* prev = nullptr;
    Node* cur = _tables[hashi];
    while (cur)
    {
        if (cur->_kv.first == key)
        {
            if (prev == nullptr)
                _tables[hashi] = cur->_next;
            else
                prev->_next = cur->_next;
            delete cur;
            --_n;
            return true;
        }
        prev = cur;
        cur = cur->_next;
    }
    return false;
}

另一种删除写法，不使用临时指针 prev，采用替换法

替换法：要删除的结点cur，将下一个结点的值覆盖自己的值，然后删除下一个结点。如果cur是最后一个结点，那么就用头结点来覆盖，然后删除头结点。

bool Erase(const K& key)
{
    if (_tables.size() == 0)
    {
        return false;
    }
    
	HashFunc hf;
    size_t hashi = hf(key);
    hashi %= _tables.size();
    Node* cur = _tables[hashi];
    while (cur)
    {
        if (cur->_kv.first == key)
        {
            if (cur->_next == nullptr)
            {
                cur->_kv = _tables[hashi]->_kv;
                Node* first = _tables[hashi];
                _tables[hashi] = first->_next;
                delete first;
            }
            else
            {
                cur->_kv = cur->_next->_kv;
                Node* next = cur->_next;
                cur->_next = next->_next;
                delete next;
            }
            --_n;
            return true;
        }
        cur = cur->_next;
    }
    return false;
}

拷贝构造与赋值重载

默认构造则什么都不用做，但是要显式写出来。

拷贝构造的一种写法是调用 Insert：

HashTable()
{}

HashTable(const HashTable<K, V>& ht)
{
    _tables.resize(ht._tables.size());
    for (size_t i = 0; i < ht._tables.size(); ++i)
    {
        Node* cur = ht._tables[i];
        while (cur)
        {
            Insert(cur->_kv);
            cur = cur->_next;
        }
    }
}

另一种写法则是原样复制照搬：

HashTable(const HashTable<K, V>& ht)
    :_n(ht._n)
{
    _tables.resize(ht._tables.size());
    for (size_t i = 0; i < ht._tables.size(); ++i)
    {
        Node* cur = ht._tables[i];
        Node* tail = nullptr;
        while (cur)
        {
            Node* newNode = new Node(cur->_kv);

            if (tail == nullptr)
                _tables[i] = newNode;
            else
                tail->_next = newNode;
            tail = newNode;

            cur = cur->_next;
        }
    }
}

赋值重载

void swap(HashTable<K, V>& ht)
{
    _tables.swap(ht._tables);
    std::swap(_n, ht._n);
}

HashTable<K, V>& operator=(HashTable<K, V> ht)
{
    swap(ht);
    return *this;
}

完整代码

#pragma once
#include <iostream>
#include <vector>
using namespace std;

template<class K, class V>
struct HashNode
{
	pair<K, V> _kv;
	HashNode<K, V>* _next;

	HashNode(const pair<K, V>& kv)
		: _kv(kv)
		, _next(nullptr)
	{}
};

template<class K>
struct DefaultHash
{
	size_t operator()(const K& key)
	{
		return key;
	}
};

template<>
struct DefaultHash<string>
{
	size_t operator()(const string& key)
	{
		size_t hash = 0;
		for (auto ch : key)
		{
			hash = hash * 131 + ch;
		}
		return hash;
	}
};

template<class K, class V, class HashFunc = DefaultHash<K>>
class HashTable
{
	typedef HashNode<K, V> Node;
public:

	HashTable()
	{}

	//HashTable(const HashTable<K, V>& ht)
	//{
	//	_tables.resize(ht._tables.size());
	//	for (size_t i = 0; i < ht._tables.size(); ++i)
	//	{
	//		Node* cur = ht._tables[i];
	//		while (cur)
	//		{
	//			Insert(cur->_kv);
	//			cur = cur->_next;
	//		}
	//	}
	//}

	HashTable(const HashTable<K, V>& ht)
		:_n(ht._n)
	{
		_tables.resize(ht._tables.size());
		for (size_t i = 0; i < ht._tables.size(); ++i)
		{
			Node* cur = ht._tables[i];
			Node* tail = nullptr;
			while (cur)
			{
				Node* newNode = new Node(cur->_kv);

				if (tail == nullptr)
					_tables[i] = newNode;
				else
					tail->_next = newNode;
				tail = newNode;

				cur = cur->_next;
			}
		}
	}

	HashTable<K, V>& operator=(HashTable<K, V> ht)
	{
		swap(ht);
		return *this;
	}

	~HashTable()
	{
		for (size_t i = 0; i < _tables.size(); ++i)
		{
			Node* cur = _tables[i];
			while (cur)
			{
				Node* next = cur->_next;
				delete cur;
				cur = next;
			}
			_tables[i] = nullptr;
		}
	}

	void swap(HashTable<K, V>& ht)
	{
		_tables.swap(ht._tables);
		std::swap(_n, ht._n);
	}

	bool Insert(const pair<K, V>& kv)
	{
		if (Find(kv.first))
		{
			return false;
		}

		HashFunc hf;
		// 载荷因子 == 1，扩容，重新映射
		// 有缺陷的写法
		//if (_tables.size() == _n)
		//{
		//	size_t newSize = _tables.size() == 0 ? 10 : 2 * _tables.size();
		//	HashTable<K, V> newHT;
		//	newHT._tables.resize(newSize);
		//	for (size_t i = 0; i < _tables.size(); ++i)
		//	{
		//		Node* cur = _tables[i];
		//		while (cur)
		//		{
		//			newHT.Insert(cur->_kv);
		//			cur = cur->_next;
		//		}
		//	}
		//	newHT._tables.swap(_tables);
		//}

		// 较高效的写法
		if (_tables.size() == _n)
		{
			size_t newSize = _tables.size() == 0 ? 10 : 2 * _tables.size();
			vector<Node*> newTable;
			newTable.resize(newSize);
			for (size_t i = 0; i < _tables.size(); ++i)
			{
				Node* cur = _tables[i];
				while (cur)
				{
					Node* next = cur->_next;

					size_t hashi = hf(cur->_kv.first) % newSize;
					cur->_next = newTable[hashi];
					newTable[hashi] = cur;

					cur = next;
				}
				_tables[i] = nullptr;
			}
			newTable.swap(_tables);
		}

		size_t hashi = hf(kv.first);
		hashi %= _tables.size();
		//头插，然后把头结点指针塞进桶里面
		Node* newNode = new Node(kv);
		newNode->_next = _tables[hashi];
		_tables[hashi] = newNode;
		
		++_n;
		return true;
	}

	Node* Find(const K& key)
	{
		if (_tables.size() == 0)
		{
			return nullptr;
		}

		HashFunc hf;
		size_t hashi = hf(key);
		hashi %= _tables.size();
		Node* cur = _tables[hashi];
		while (cur)
		{
			if (cur->_kv.first == key)
			{
				return cur;
			}
			cur = cur->_next;
		}
		return nullptr;
	}

	bool Erase(const K& key)
	{
		if (_tables.size() == 0)
		{
			return false;
		}

		HashFunc hf;
		size_t hashi = hf(key);
		hashi %= _tables.size();
		Node* prev = nullptr;
		Node* cur = _tables[hashi];
		while (cur)
		{
			if (cur->_kv.first == key)
			{
				if (prev == nullptr)
					_tables[hashi] = cur->_next;
				else
					prev->_next = cur->_next;
				delete cur;
				--_n;
				return true;
			}
			prev = cur;
			cur = cur->_next;
		}
		return false;
	}

	//bool Erase(const K& key)
	//{
	//	if (_tables.size() == 0)
	//	{
	//		return false;
	//	}

	//	HashFunc hf;
	//	size_t hashi = hf(key);
	//	hashi %= _tables.size();
	//	Node* cur = _tables[hashi];
	//	while (cur)
	//	{
	//		if (cur->_kv.first == key)
	//		{
	//			if (cur->_next == nullptr)
	//			{
	//				cur->_kv = _tables[hashi]->_kv;
	//				Node* first = _tables[hashi];
	//				_tables[hashi] = first->_next;
	//				delete first;
	//			}
	//			else
	//			{
	//				cur->_kv = cur->_next->_kv;
	//				Node* next = cur->_next;
	//				cur->_next = next->_next;
	//				delete next;
	//			}
	//			--_n;
	//			return true;
	//		}
	//		cur = cur->_next;
	//	}
	//	return false;
	//}

private:
	// 指针数组
	vector<Node*> _tables;
	size_t _n = 0;
};

封装成 unordered_map、unordered_set

改造哈希表

参考 set 和 map 的封装，这里的类型不能写死，要改成泛型 T，使它既可以是单值类型也可以是键值对类型。

template<class T>
struct HashNode
{
	T _data;
	HashNode<T>* _next;

	HashNode(const T& data)
		: _data(data)
		, _next(nullptr)
	{}
};

下面的 HashTable 也要做出相应的修改：

哈希函数 HashFunc 改为从上层传入，为了方便用户传入自己实现的哈希函数，这里原本的缺省模板参数可以去掉

template<class K, class T, class HashFunc>
class HashTable
{
	typedef HashNode<T> Node;
public:
    //略。。。

set 和 map 取 key 的方式不同，需要再传一个模板参数 KeyOfT，表示仿函数类型。

继续对 HashTable 进行改造，主要是 Insert 函数，因为它的参数是 T 类型的data。

template<class K, class T, class KeyOfT, class HashFunc>
class HashTable
{
    //...
    bool Insert(const T& data)
	{
		HashFunc hf;
		KeyOfT kot;	//取 key 的函数对象

		if (Find(kot(data)))	//改动
		{
			return false;
		}

		// 载荷因子 == 1，扩容，重新映射
		if (_tables.size() == _n)
		{
			size_t newSize = _tables.size() == 0 ? 10 : 2 * _tables.size();
			vector<Node*> newTable;
			newTable.resize(newSize);
			for (size_t i = 0; i < _tables.size(); ++i)
			{
				Node* cur = _tables[i];
				while (cur)
				{
					Node* next = cur->_next;

					size_t hashi = hf(kot(cur->_data)) % newSize;	//改动
					cur->_next = newTable[hashi];
					newTable[hashi] = cur;

					cur = next;
				}
				_tables[i] = nullptr;
			}
			newTable.swap(_tables);
		}

		size_t hashi = hf(kot(data));	//改动
		hashi %= _tables.size();
		//头插，然后把头结点指针塞进桶里面
		Node* newNode = new Node(kv);
		newNode->_next = _tables[hashi];
		_tables[hashi] = newNode;
		
		++_n;
		return true;
	}
    //...
}

另外 Find Erase 函数需要将传入的 key 与表中 data 的 key 进行比较，也需要用到仿函数，这里不多做演示。

迭代器

哈希表的迭代器不同于 list，它的 ++ 实现离不开对整个表的控制，所以一个迭代器包含了两个指针。

因为 __HTIterator 和 HashTable 两个类之间有交集，所以写在上面的类的前面必须声明另一个类。

//HashTable.h
template<class K, class T, class KeyOfT, class HashFunc>
class HashTable;

template<class K, class T, class KeyOfT, class HashFunc>
class __HTIterator
{
	typedef HashNode<T> Node;
	typedef __HTIterator<K, T, KeyOfT, HashFunc> Self;
public:
	Node* _node;
	HashTable<K, T, KeyOfT, HashFunc>* _pht;

    __HTIterator()
	{}
    
	__HTIterator(Node* node, HashTable<K, T, KeyOfT, HashFunc>* pht)
		: _node(node)
		, _pht(pht)
	{}

};

++：

如果当前结点的下一个不为空，就将指针移到下一个结点
如果当前结点的下一个为空，则要找下一个非空桶，指针指向下一个非空桶的第一个结点。如果没有下一个非空桶，则将指针设为空。

Self& operator++()
{
    if (_node->_next)
    {
        _node = _node->_next;
    }
    else // 找下一个桶
    {
        KeyOfT kot;
        HashFunc hf;
        size_t hashi = hf(kot(_node->_data)) % _pht->_tables.size();
        for (++hashi; hashi < _pht->_tables.size(); ++hashi)
        {
            if (_pht->_tables[hashi])
            {
                _node = _pht->_tables[hashi];
                break;
            }
        }
        // 未找到
        if (hashi == _pht->_tables.size())
        {
            _node = nullptr;
        }
    }
    return *this;
}

Self operator++(int)
{
    Self tmp(_node, _pht);
    ++(*this);
    return tmp;
}

其他迭代器操作符的重载：

T& operator*()
{
    return _node->_data;
}

T* operator->()
{
    return &_node->_data;
}

bool operator==(const Self& s) const
{
    return _node == s._node;
}

bool operator!=(const Self& s) const
{
    return _node != s._node;
}

在 HashTable 里面实现 begin end

template<class K, class T, class KeyOfT, class HashFunc>
class HashTable
{
    template<class K, class T, class KeyOfT, class HashFunc>
	friend class __HTIterator;
    
	typedef HashNode<T> Node;
	typedef HashTable<K, T, KeyOfT, HashFunc> Self;
public:
	typedef __HTIterator<K, T, KeyOfT, HashFunc> iterator;

	iterator begin()
	{
		for (size_t i = 0; i < _tables.size(); ++i)
		{
			Node* cur = _tables[i];
			if (cur)
			{
				return iterator(cur, this);
			}
		}
		return end();
	}

	iterator end()
	{
		return iterator(nullptr, this);
	}
    
    //...

另外需要将find的返回类型改成迭代器

iterator Find(const K& key)
{
    if (_tables.size() == 0)
    {
        return iterator(nullptr, this);
    }

    HashFunc hf;
    KeyOfT kot;

    size_t hashi = hf(key);
    hashi %= _tables.size();
    Node* cur = _tables[hashi];
    while (cur)
    {
        if (kot(cur->_data) == key)
        {
            return iterator(cur, this);
        }
        cur = cur->_next;
    }
    return iterator(nullptr, this);
}

封装

为了避免与标准库命名冲突，我们用一个命名空间包起来。

namespace my
{
	template<class K, class HashFunc = DefaultHash<K>>
	class unordered_set
	{
		struct SetKeyOfT
		{
			const K& operator()(const K& key)
			{
				return key;
			}
		};
	public:
		typedef typename HashTable<K, K, SetKeyOfT, HashFunc>::iterator iterator;

		iterator begin()
		{
			return _ht.begin();
		}

		iterator end()
		{
			return _ht.end();
		}

		pair<iterator, bool> insert(const K& key)
		{
			return _ht.Insert(key);
		}

		iterator find(const K& key)
		{
			return _ht.Find(key);
		}

		bool erase(const K& key)
		{
			return _ht.Erase(key);
		}

	private:
		HashTable<K, K, SetKeyOfT, HashFunc> _ht;
	};
}

namespace my
{
	template<class K, class V, class HashFunc = DefaultHash<K>>
	class unordered_map
	{
		struct MapKeyOfT
		{
			const K& operator()(const pair<K, V>& kv)
			{
				return kv.first;
			}
		};
	public:
		typedef typename HashTable<K, pair<K, V>, MapKeyOfT, HashFunc>::iterator iterator;

		iterator begin()
		{
			return _ht.begin();
		}

		iterator end()
		{
			return _ht.end();
		}

		pair<iterator, bool> insert(const pair<K, V>& kv)
		{
			return _ht.Insert(kv);
		}

		iterator find(const K& key)
		{
			return _ht.Find(key);
		}

		bool erase(const K& key)
		{
			return _ht.Erase(key);
		}

	private:
		HashTable<K, pair<K, V>, MapKeyOfT, HashFunc> _ht;
	};
}

map 还需要封装 []，这里就需要对Insert再次进行改造，类似的改造在 set和map的模拟实现_世真的博客-CSDN博客和 STL - set 和 map 的使用_世真的博客-CSDN博客中有详细讲解。

把返回类型改成 pair<iterator, bool>，然后把两个 return 部分改好。

pair<iterator, bool> Insert(const T& data)
{
    HashFunc hf;
    KeyOfT kot;

    iterator pos = Find(kot(data));
    if (pos != end())
    {
        return make_pair(pos, false);
    }

    // 载荷因子 == 1，扩容，重新映射
    if (_tables.size() == _n)
    {
        size_t newSize = _tables.size() == 0 ? 10 : 2 * _tables.size();
        vector<Node*> newTable;
        newTable.resize(newSize);
        for (size_t i = 0; i < _tables.size(); ++i)
        {
            Node* cur = _tables[i];
            while (cur)
            {
                Node* next = cur->_next;

                size_t hashi = hf(kot(cur->_data)) % newSize;
                cur->_next = newTable[hashi];
                newTable[hashi] = cur;

                cur = next;
            }
            _tables[i] = nullptr;
        }
        newTable.swap(_tables);
    }

    size_t hashi = hf(kot(data));
    hashi %= _tables.size();
    //头插，然后把头结点指针塞进桶里面
    Node* newNode = new Node(data);
    newNode->_next = _tables[hashi];
    _tables[hashi] = newNode;

    ++_n;
    return make_pair(iterator(newNode, this), true);
}

operator[]:

V& operator[](const K& key)
{
    pair<iterator, bool> ret = _ht.Insert(make_pair(key, V()));
    return ret.first->second;
}

关于除留余数法的补充

除留余数法中，除数应该是质数。我们在上面的实现中，除数就是容量，扩容是2倍扩，显然这种方式不能保证除数为质数。质数本身也没有具体的公式去直接计算得到

为了实现真正的除留余数法，我们可以打一个质数表：

这些质数相邻两个之间大约成2倍关系

size_t GetNextPrime(size_t prime)
{
    const int PRIMECOUNT = 28;
    static const size_t primeList[PRIMECOUNT] =
    {
        53ul, 97ul, 193ul, 389ul, 769ul,
        1543ul, 3079ul, 6151ul, 12289ul, 24593ul,
        49157ul, 98317ul, 196613ul, 393241ul, 786433ul,
        1572869ul, 3145739ul, 6291469ul, 12582917ul,25165843ul,
        50331653ul, 100663319ul, 201326611ul, 402653189ul,805306457ul,
        1610612741ul, 3221225473ul, 4294967291ul
    };
    size_t i = 0;
    for (; i < PRIMECOUNT; ++i)
    {
        if (primeList[i] > prime)
            return primeList[i];
    }
    return primeList[i];
}

使用该函数，传入一个值可以找到下一个大约是这个值的2倍的质数。

对插入函数的扩容部分进行修改：

//...
// 载荷因子 == 1，扩容，重新映射
if (_tables.size() == _n)
{
    //size_t newSize = _tables.size() == 0 ? 10 : 2 * _tables.size();
    size_t newSize = GetNextPrime(_tables.size());

//...

完整代码

HashTable.h

#pragma once
#include <iostream>
#include <vector>
using namespace std;

template<class T>
struct HashNode
{
	T _data;
	HashNode<T>* _next;

	HashNode(const T& data)
		: _data(data)
		, _next(nullptr)
	{}
};

template<class K, class T, class KeyOfT, class HashFunc>
class HashTable;

template<class K, class T, class KeyOfT, class HashFunc>
class __HTIterator
{
	typedef HashNode<T> Node;
	typedef __HTIterator<K, T, KeyOfT, HashFunc> Self;
public:
	Node* _node;
	HashTable<K, T, KeyOfT, HashFunc>* _pht;

	__HTIterator()
	{}
	
	__HTIterator(Node* node, HashTable<K, T, KeyOfT, HashFunc>* pht)
		: _node(node)
		, _pht(pht)
	{}

	Self& operator++()
	{
		if (_node->_next)
		{
			_node = _node->_next;
		}
		else // 找下一个桶
		{
			KeyOfT kot;
			HashFunc hf;
			size_t hashi = hf(kot(_node->_data)) % _pht->_tables.size();
			for (++hashi; hashi < _pht->_tables.size(); ++hashi)
			{
				if (_pht->_tables[hashi])
				{
					_node = _pht->_tables[hashi];
					break;
				}
			}
			// 未找到
			if (hashi == _pht->_tables.size())
			{
				_node = nullptr;
			}
		}
		return *this;
	}

	Self operator++(int)
	{
		Self tmp(_node, _pht);
		++(*this);
		return tmp;
	}

	T& operator*()
	{
		return _node->_data;
	}

	T* operator->()
	{
		return &_node->_data;
	}

	bool operator==(const Self& s) const
	{
		return _node == s._node;
	}

	bool operator!=(const Self& s) const
	{
		return _node != s._node;
	}

};

template<class K>
struct DefaultHash
{
	size_t operator()(const K& key)
	{
		return key;
	}
};

template<>
struct DefaultHash<string>
{
	size_t operator()(const string& key)
	{
		size_t hash = 0;
		for (auto ch : key)
		{
			hash = hash * 131 + ch;
		}
		return hash;
	}
};

template<class K, class T, class KeyOfT, class HashFunc>
class HashTable
{
	template<class K, class T, class KeyOfT, class HashFunc>
	friend class __HTIterator;

	typedef HashNode<T> Node;
	typedef HashTable<K, T, KeyOfT, HashFunc> Self;
public:
	typedef __HTIterator<K, T, KeyOfT, HashFunc> iterator;

	iterator begin()
	{
		for (size_t i = 0; i < _tables.size(); ++i)
		{
			Node* cur = _tables[i];
			if (cur)
			{
				return iterator(cur, this);
			}
		}
		return end();
	}

	iterator end()
	{
		return iterator(nullptr, this);
	}

	HashTable()
	{}

	HashTable(const Self& ht)
		:_n(ht._n)
	{
		_tables.resize(ht._tables.size());
		for (size_t i = 0; i < ht._tables.size(); ++i)
		{
			Node* cur = ht._tables[i];
			Node* tail = nullptr;
			while (cur)
			{
				Node* newNode = new Node(cur->_kv);

				if (tail == nullptr)
					_tables[i] = newNode;
				else
					tail->_next = newNode;
				tail = newNode;

				cur = cur->_next;
			}
		}
	}

	Self& operator=(Self ht)
	{
		swap(ht);
		return *this;
	}

	~HashTable()
	{
		for (size_t i = 0; i < _tables.size(); ++i)
		{
			Node* cur = _tables[i];
			while (cur)
			{
				Node* next = cur->_next;
				delete cur;
				cur = next;
			}
			_tables[i] = nullptr;
		}
	}

	void swap(Self& ht)
	{
		_tables.swap(ht._tables);
		std::swap(_n, ht._n);
	}

	size_t GetNextPrime(size_t prime)
	{
		const int PRIMECOUNT = 28;
		static const size_t primeList[PRIMECOUNT] =
		{
			53ul, 97ul, 193ul, 389ul, 769ul,
			1543ul, 3079ul, 6151ul, 12289ul, 24593ul,
			49157ul, 98317ul, 196613ul, 393241ul, 786433ul,
			1572869ul, 3145739ul, 6291469ul, 12582917ul,25165843ul,
			50331653ul, 100663319ul, 201326611ul, 402653189ul,805306457ul,
			1610612741ul, 3221225473ul, 4294967291ul
		};
		size_t i = 0;
		for (; i < PRIMECOUNT; ++i)
		{
			if (primeList[i] > prime)
				return primeList[i];
		}
		return primeList[i];
	}

	pair<iterator, bool> Insert(const T& data)
	{
		HashFunc hf;
		KeyOfT kot;

		iterator pos = Find(kot(data));
		if (pos != end())
		{
			return make_pair(pos, false);
		}

		// 载荷因子 == 1，扩容，重新映射
		if (_tables.size() == _n)
		{
			//size_t newSize = _tables.size() == 0 ? 10 : 2 * _tables.size();
			size_t newSize = GetNextPrime(_tables.size());
			vector<Node*> newTable;
			newTable.resize(newSize);
			for (size_t i = 0; i < _tables.size(); ++i)
			{
				Node* cur = _tables[i];
				while (cur)
				{
					Node* next = cur->_next;

					size_t hashi = hf(kot(cur->_data)) % newSize;
					cur->_next = newTable[hashi];
					newTable[hashi] = cur;

					cur = next;
				}
				_tables[i] = nullptr;
			}
			newTable.swap(_tables);
		}

		size_t hashi = hf(kot(data));
		hashi %= _tables.size();
		//头插，然后把头结点指针塞进桶里面
		Node* newNode = new Node(data);
		newNode->_next = _tables[hashi];
		_tables[hashi] = newNode;
		
		++_n;
		return make_pair(iterator(newNode, this), true);
	}

	iterator Find(const K& key)
	{
		if (_tables.size() == 0)
		{
			return iterator(nullptr, this);
		}

		HashFunc hf;
		KeyOfT kot;

		size_t hashi = hf(key);
		hashi %= _tables.size();
		Node* cur = _tables[hashi];
		while (cur)
		{
			if (kot(cur->_data) == key)
			{
				return iterator(cur, this);
			}
			cur = cur->_next;
		}
		return iterator(nullptr, this);
	}

	bool Erase(const K& key)
	{
		if (_tables.size() == 0)
		{
			return false;
		}

		HashFunc hf;
		KeyOfT kot;

		size_t hashi = hf(key);
		hashi %= _tables.size();
		Node* prev = nullptr;
		Node* cur = _tables[hashi];
		while (cur)
		{
			if (kot(cur->_data) == key)
			{
				if (prev == nullptr)
					_tables[hashi] = cur->_next;
				else
					prev->_next = cur->_next;
				delete cur;
				--_n;
				return true;
			}
			prev = cur;
			cur = cur->_next;
		}
		return false;
	}

private:
	// 指针数组
	vector<Node*> _tables;
	size_t _n = 0;
};

UnorderedSet.h

#pragma once
#include "HashTable.h"

namespace my
{
	template<class K, class HashFunc = DefaultHash<K>>
	class unordered_set
	{
		struct SetKeyOfT
		{
			const K& operator()(const K& key)
			{
				return key;
			}
		};
	public:
		typedef typename HashTable<K, K, SetKeyOfT, HashFunc>::iterator iterator;

		iterator begin()
		{
			return _ht.begin();
		}

		iterator end()
		{
			return _ht.end();
		}

		pair<iterator, bool> insert(const K& key)
		{
			return _ht.Insert(key);
		}

		iterator find(const K& key)
		{
			return _ht.Find(key);
		}

		bool erase(const K& key)
		{
			return _ht.Erase(key);
		}

	private:
		HashTable<K, K, SetKeyOfT, HashFunc> _ht;
	};
}

UnorderedMap.h

#pragma once
#include "HashTable.h"

namespace my
{
	template<class K, class V, class HashFunc = DefaultHash<K>>
	class unordered_map
	{
		struct MapKeyOfT
		{
			const K& operator()(const pair<K, V>& kv)
			{
				return kv.first;
			}
		};
	public:
		typedef typename HashTable<K, pair<K, V>, MapKeyOfT, HashFunc>::iterator iterator;

		iterator begin()
		{
			return _ht.begin();
		}

		iterator end()
		{
			return _ht.end();
		}

		pair<iterator, bool> insert(const pair<K, V>& kv)
		{
			return _ht.Insert(kv);
		}

		iterator find(const K& key)
		{
			return _ht.Find(key);
		}

		bool erase(const K& key)
		{
			return _ht.Erase(key);
		}

		V& operator[](const K& key)
		{
			pair<iterator, bool> ret = _ht.Insert(make_pair(key, V()));
			return ret.first->second;
		}

	private:
		HashTable<K, pair<K, V>, MapKeyOfT, HashFunc> _ht;
	};
}

unordered_map其他底层接口

Buckets

bucket_count：返回桶的数量
max_bucket_count：返回最大桶数
bucket_size：返回桶的大小
bucket：返回具有键k的元素所在的桶号。

Hash policy

load_factor：返回载荷因子
max_bucket_count：返回最大载荷因子
rehash：设置桶数

这些用得不多，可以大致了解一下。

本文含有隐藏内容，请开通VIP 后查看

[C++数据结构](22)哈希表与unordered_set，unordered_map实现

文章目录

unordered 系列容器

哈希表的概念

哈希函数

哈希函数设计原则

常用哈希函数

哈希冲突

概念

解决方式

闭散列

开散列

实现：除留余数法，闭散列

框架

查找

插入与扩容

删除

仿函数

完整代码

实现：除留余数法、开散列

框架

插入与扩容

析构

查找

删除

拷贝构造与赋值重载

完整代码

封装成 unordered_map、unordered_set

改造哈希表

迭代器

封装

关于除留余数法的补充

完整代码

HashTable.h

UnorderedSet.h

UnorderedMap.h

unordered_map其他底层接口

网站公告

今日签到

热门文章

最新发布