赞
踩
https://blog.csdn.net/u010512607/article/details/40005641
思路:
1.读入文件,按行将文字拼接成字符串str
2.用正则过滤字符串中的标点,再分割成str[]
3.用hashmap依次统计每个单词出现的次数(可以加黑名单过滤情态动词等)
4.对hashmap的值排序(利用Collections的sort,重写比较器Comparator的compare)
5.输出hashmap前10个单词
代码:
public class Main{
public static void main(String[] args) throws IOException {
readPaper();
}
//统计一篇英文文章中出现次数最多的10个单词
public static void readPaper() throws IOException{
HashMap<String, Integer> wordMap = new HashMap<String, Integer>();
File file = new File("e:/info.log");
BufferedReader br=new BufferedReader(new FileReader(file));
StringBuilder sb=new StringBuilder();
String line=null;
while((line=br.readLine())!=null){
sb.append(line);
}
br.close();
String words=sb.toString();// 全部的单词字符串
String target=words.replaceAll("\\pP|\\pS", "");// 将标点替换为空
//小写 p 是 property 的意思,表示 Unicode 属性,用于 Unicode 正表达式的前缀
//大写 P 表示 Unicode 字符集七个字符属性之一:标点字符
//大写S:符号(比如数学符号、货币符号等);
String[] single=target.split(" ");
String[] keys={ "you", "i", "he", "she", "me", "him", "her", "it",
"they", "them", "we", "us", "your", "yours", "our", "his",
"her", "its", "my", "in", "into", "on", "for", "out", "up",
"down", "at", "to", "too", "with", "by", "about", "among",
"between", "over", "from", "be", "been", "am", "is", "are",
"was", "were", "whthout", "the", "of", "and", "a", "an",
"that", "this", "be", "or", "as", "will", "would", "can",
"could", "may", "might", "shall", "should", "must", "has",
"have", "had", "than" };
// 将一部分常见的无意义的英语单词替换为字符 '#' 以便后面输出单词出现次数时的判断
// for(int i=0;i<single.length;i++){
// for(String str:keys){
// if(str.equals(str)){
// single[i]="#";
// }
// }
// }
// 将单词以及其出现的次数关联起来
for(int i=0;i<single.length;i++){
if(wordMap.get(single[i])==null){
wordMap.put(single[i],1);
}else{
wordMap.put(single[i], wordMap.get(single[i])+1);
}
}
//比较器,按值排序
List<Entry<String,Integer>> list=new ArrayList
<Entry<String,Integer>>(wordMap.entrySet());
Collections.sort(list,new Comparator<Entry<String,Integer>>(){
@Override
public int compare(Entry<String, Integer> o1,
Entry<String, Integer> o2) {
return o2.getValue()-o1.getValue();
}
}
);
//输出次数最多的单词
int count=1;
for(Map.Entry<String, Integer> entry:list){
if(entry.getKey().equals("#")){
continue;
}
System.out.println(entry.getKey()+":"+entry.getValue());
count++;
if(count==11){
break;
}
}
}
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。