topK问题——统计一篇很长的英文文章中频次出现最高的10个单词_一本英文书,怎么统计出现频率最高的10个单词

作者：盐析白兔 | 2024-07-22 08:08:19

踩

一本英文书,怎么统计出现频率最高的10个单词

上篇讲述了topK问题的N个数中最大的前K个数，本篇则讲述统计一篇很长的英文文章中频次出现最高的10个单词。

例题2：统计一篇很长的英文文章中频次出现最高的10个单词。

思路：分治法 + hash + 小根堆

(1) 定义一个关联容器hash_map<string,int>，用于统计英文文章中每个单词出现的次数；定义一个vector<hash_map<string,int>::iterator>，用于存储小根堆K；

(2) 将英文文章中的前K个单词先填满topK堆;

(3) 调整topK堆为小根堆结构;

(4) 通过遍历将后面的单次出现的次数与堆顶单词出现的次数（此时堆顶单词出现的次数最小）比较，大于堆顶元素就入堆，并下调堆结构。

(5) 遍历结束，则小根堆中的单词即英文文章中频次出现最高的10个单词。

代码如下：


//  topK问题  统计一篇英文文章中出现频次最高的10个单词  分治法 + hash + 小根堆
#include<iostream>
#include<fstream>
#include<vector>
#include<string>
#include<hash_map>
#include<algorithm>
 
using namespace std;
 
//  调整堆
void adjustDown(vector<hash_map<string, int>::iterator> &topK, int i, int k)
{
	int min = i;
	int child = 2 * min + 1;
	while (min < k / 2)
	{
		if (child + 1 < k && topK[child]->second > topK[child + 1]->second)
			child++;
		if (child<k && topK[min]->second>topK[child]->second)
		{
			swap(topK[min], topK[child]);
			min = child;
			child = 2 * min + 1;
		}
		else
			break;
	}
}
 
//  建小根堆
void buildHeap(vector<hash_map<string, int>::iterator> &topK, int k)
{
	for (int i = k / 2 - 1; i >= 0; i--)
		adjustDown(topK, i, k);
}
 
//  获取文章中出现频次最高的前k个单词
void getTopK(hash_map<string, int> &essay, vector<hash_map<string, int>::iterator> &topK, int k)
{
	auto it = essay.begin();
	for (int i = 0; i < k; i++)
	{
		topK[i] = it;
		it++;
	}
	buildHeap(topK, k);
	for (; it != essay.end(); it++)
	{
		if (it->second>topK[0]->second)
		{
			topK[0] = it;
			adjustDown(topK, 0, k);
		}
	}
	//  输出出现频次最高的前k个单词及其频次
	for (int i = 0; i < k; i++)
		cout << topK[i]->first << " " << topK[i]->second << endl;
}
int main()
{
	//  读入英文文章
	ifstream in("data.txt");
	string word;
	hash_map<string, int> essay;  //  hash_map统计每个单词出现频次
	while (in >> word)
	{
		essay[word]++;
	}
	int k;
	while (cin >> k)
	{
		vector<hash_map<string, int>::iterator> topK(k, essay.begin());
		getTopK(essay, topK, k);
	}
	return 0;
}

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/盐析白兔/article/detail/864133?site