当前位置:   article > 正文

统计一篇英文文章中出现次数最多的10个单词_统计一部英文小说或者一篇英文论文,出现频率最高的5个单词。要求:1、不统计冠

统计一部英文小说或者一篇英文论文,出现频率最高的5个单词。要求:1、不统计冠

https://blog.csdn.net/u010512607/article/details/40005641

思路:

1.读入文件,按行将文字拼接成字符串str
2.用正则过滤字符串中的标点,再分割成str[]
3.用hashmap依次统计每个单词出现的次数(可以加黑名单过滤情态动词等)
4.对hashmap的值排序(利用Collections的sort,重写比较器Comparator的compare)
5.输出hashmap前10个单词

代码:

public class Main{
    public static void main(String[] args) throws IOException {
        readPaper();
}

    //统计一篇英文文章中出现次数最多的10个单词
    public static void readPaper() throws IOException{

        HashMap<String, Integer> wordMap = new HashMap<String, Integer>();

        File file = new File("e:/info.log");
        BufferedReader br=new BufferedReader(new FileReader(file));

        StringBuilder sb=new StringBuilder();
        String line=null;
        while((line=br.readLine())!=null){
            sb.append(line);
        }
        br.close();


        String words=sb.toString();// 全部的单词字符串
        String target=words.replaceAll("\\pP|\\pS", "");// 将标点替换为空
        //小写 p 是 property 的意思,表示 Unicode 属性,用于 Unicode 正表达式的前缀
        //大写 P 表示 Unicode 字符集七个字符属性之一:标点字符
        //大写S:符号(比如数学符号、货币符号等);
        String[] single=target.split(" ");


        String[] keys={ "you", "i", "he", "she", "me", "him", "her", "it",
                "they", "them", "we", "us", "your", "yours", "our", "his",
                "her", "its", "my", "in", "into", "on", "for", "out", "up",
                "down", "at", "to", "too", "with", "by", "about", "among",
                "between", "over", "from", "be", "been", "am", "is", "are",
                "was", "were", "whthout", "the", "of", "and", "a", "an",
                "that", "this", "be", "or", "as", "will", "would", "can",
                "could", "may", "might", "shall", "should", "must", "has",
                "have", "had", "than" };

        // 将一部分常见的无意义的英语单词替换为字符 '#' 以便后面输出单词出现次数时的判断
//      for(int i=0;i<single.length;i++){
//          for(String str:keys){
//              if(str.equals(str)){
//                  single[i]="#";
//              }
//          }
//      }

        // 将单词以及其出现的次数关联起来
        for(int i=0;i<single.length;i++){
            if(wordMap.get(single[i])==null){
                wordMap.put(single[i],1);       
            }else{
                wordMap.put(single[i], wordMap.get(single[i])+1);
            }
        }

        //比较器,按值排序
        List<Entry<String,Integer>> list=new ArrayList
                <Entry<String,Integer>>(wordMap.entrySet());
        Collections.sort(list,new Comparator<Entry<String,Integer>>(){

            @Override
            public int compare(Entry<String, Integer> o1,
                    Entry<String, Integer> o2) {
                return o2.getValue()-o1.getValue();
            }

        }
            );


        //输出次数最多的单词
        int count=1;
        for(Map.Entry<String, Integer> entry:list){
            if(entry.getKey().equals("#")){
                continue;
            }
            System.out.println(entry.getKey()+":"+entry.getValue());
            count++;
            if(count==11){
                break;
            }
        }
    }
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/天景科技苑/article/detail/864142
推荐阅读
相关标签
  

闽ICP备14008679号