当前位置:   article > 正文

mapreduce典型应用案例之倒排索引_倒排索引 测试数据如下 1)文件一:data01.txt,内容:beijing is beautif

倒排索引 测试数据如下 1)文件一:data01.txt,内容:beijing is beautiful i love b

一、倒排索引的介绍

通俗的讲,就是根据单词找到包含这个单词的所有文档。

二、mapreduce实现框架

1、首先要确定map、reduce、combiner中的key和value是什么类型
2、然后确定key和value具体是什么?
Map : key为 单词+文件名 value为空
combiner : key为单词 value为次数+文件名
reduce: key为单词 value为相同单词的“次数+文件名”拼接而成
原理图

三、mapreduce代码实现

1、准备数据

a.txti love beijing and love china
b.txti love beijing and not like New York
c.txti dot like anycity
d.txtyou like where
e.txtlove familiy and love china

2、具体代码实现

package com.qyl.master;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

public class MyMapReduce {
       public static class MyMapper extends Mapper<LongWritable,Text,Text,Text>{
           private Text okey=new Text();
           private Text ovalue=new Text();
           @Override
           protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
               String filename = ((FileSplit) context.getInputSplit()).getPath().getName();
               String[] strs = value.toString().split(" ");
               for(String s:strs){
                   okey.set(s+"-"+filename);
                   context.write(okey,ovalue);
               }
           }
       }
       public static class MyCombiner extends Reducer<Text,Text,Text,Text>{
           private Text okey=new Text();
           private Text ovalue=new Text();

           @Override
           protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
               int count=0;
               for(Text text:values){
                   count++;
               }
               String strs[]=key.toString().split("-");
               okey.set(strs[0]);
               ovalue.set(strs[1]+"="+count);
               context.write(okey,ovalue);
           }
       }
       public static class MyReduce extends Reducer<Text,Text,Text,Text>{
           private Text ovalue=new Text();
           @Override
           protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
               StringBuilder sb=new StringBuilder();
               for(Text text:values){
                   sb.append(text.toString()).append(",");
               }
               sb.delete(sb.length()-1,sb.length());
               ovalue.set(sb.toString());
               context.write(key,ovalue);
           }
       }

    public static void main(String[] args) {
        Configuration conf=new Configuration();
        try {
            Job job=Job.getInstance(conf);
            job.setJarByClass(MyMapReduce.class);
            job.setMapperClass(MyMapper.class);
            job.setReducerClass(MyReduce.class);
            job.setCombinerClass(MyCombiner.class);
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(Text.class);

            Path inPath =new Path("C:\\data");
            FileInputFormat.addInputPath(job, inPath);

            Path outpath=new Path("C:\\data\\result");
            if(outpath.getFileSystem(conf).exists(outpath)){
                outpath.getFileSystem(conf).delete(outpath, true);
            }
            FileOutputFormat.setOutputPath(job, outpath);

            job.waitForCompletion(true);

        } catch (Exception e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

    }


}

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93

3、结果

New	b.txt=1
York	b.txt=1
and	b.txt=1,e.txt=1,a.txt=1
anycity	c.txt=1
beijing	b.txt=1,a.txt=1
china	a.txt=1,e.txt=1
dot	c.txt=1
familiy	e.txt=1
i	c.txt=1,a.txt=1,b.txt=1
like	b.txt=1,c.txt=1,d.txt=1
love	e.txt=2,b.txt=1,a.txt=2
not	b.txt=1
where	d.txt=1
you	d.txt=1

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/秋刀鱼在做梦/article/detail/976691
推荐阅读
相关标签
  

闽ICP备14008679号