自己动手写搜索引擎（常搜吧历程三#搜索#）（Java、Lucene、hadoop）_81xamu.top

作者：凡人多烦事01 | 2024-02-29 00:43:51

踩

81xamu.top

Lucene的常用检索类

1、IndexSercher：检索操作的核心组件，用于对IndexWriter创建的索引执行，只读的检索操作，工作模式为接受Query对象而返回ScoreDoc对象。

2、Term：检索的基本单元，标示检索的字段名称和检索对象的值，如Term("title", "lucene")。即表示在title字段中搜索关键词lucene。

3、Query：表示查询的抽象类，由相应的Term来标识。

4、TermQuery：最基本的查询类型，用于匹配含有制定值字段的文档。

5、TopDoc：保存查询结果的类。

6、ScoreDoc(Hits)：用来装载搜索结果文档队列指针的数组容器。

我们先新建一个索引类：


package com.qianyan.luceneIndex;
 
import java.io.IOException;
 
 
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
 
public class IndexTest {
 
	public static void main(String[] args) throws IOException{
	
		String[] ids = {"1", "2", "3", "4"};
		String[] names = {"zhangsan", "lisi", "wangwu", "zhaoliu"};
		String[] addresses = {"shanghai", "beijing", "guangzhou", "nanjing"};
		String[] birthdays = {"19820720", "19840203", "19770409", "19830130"};
		Analyzer analyzer = new StandardAnalyzer();
		String indexDir = "E:/luceneindex";
		Directory dir = FSDirectory.getDirectory(indexDir);
		//true 表示创建或覆盖当前索引；false 表示对当前索引进行追加
		//Default value is 128
		IndexWriter writer = new IndexWriter(dir, analyzer, true, IndexWriter.MaxFieldLength.LIMITED);
		for(int i = 0; i < ids.length; i++){
			Document document = new Document();
			document.add(new Field("id", ids[i], Field.Store.YES, Field.Index.ANALYZED));
			document.add(new Field("name", names[i], Field.Store.YES, Field.Index.ANALYZED));
			document.add(new Field("address", addresses[i], Field.Store.YES, Field.Index.ANALYZED));
			document.add(new Field("birthday", birthdays[i], Field.Store.YES, Field.Index.ANALYZED));
			writer.addDocument(document);
		}
		writer.optimize();
		writer.close();
	}
	
}

下面来看简答的检索类：


package com.qianyan.lucene;
 
import java.io.IOException;
 
import org.apache.lucene.document.Document;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.BooleanClause;
import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.FuzzyQuery;
import org.apache.lucene.search.RangeQuery;
import org.apache.lucene.search.PrefixQuery;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.WildcardQuery;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
 
public class TestSeacher {
 
	public static void main(String[] args) throws IOException {
		String indexDir = "E:/luceneindex";
		Directory dir = FSDirectory.getDirectory(indexDir);
		IndexSearcher searcher = new IndexSearcher(dir);
		ScoreDoc[] hits = null;
		
		Term term = new Term("id", "2");
		TermQuery query = new TermQuery(term);
		TopDocs topDocs = searcher.search(query, 5);
		
		/* 范围检索： 19820720 - 19830130 。 true表示包含首尾
		Term beginTerm = new Term("bithday", "19820720");
		Term endTerm = new Term("bithday", "19830130");
		RangeQuery rangeQuery = new RangeQuery(beginTerm, endTerm, true);
		TopDocs topDocs = searcher.search(rangeQuery, 5);
		*/
		
		/* 前缀检索：
		Term term = new Term("name", "z");
		PrefixQuery preQuery = new PrefixQuery(term);
		TopDocs topDocs = searcher.search(preQuery, 5);
		*/
		
		/* 模糊查询：例如查找name为zhangsan的数据，那么name为zhangsun、zhangsin也会被查出来
		Term term = new Term("name", "zhangsan");
		FuzzyQuery fuzzyQuery = new FuzzyQuery(term);
		TopDocs topDocs = searcher.search(fuzzyQuery, 5);
		*/
		
 
		/* 匹配通配符： * 任何条件 ？占位符
		Term term = new Term("name", "*g??");
                WildcardQuery wildcardQuery = new WildcardQuery(term);              
                TopDocs topDocs = searcher.search(wildcardQuery, 5);
                */
		
		/* 多条件联合查询
		Term nterm = new Term("name", "*g??");
		WildcardQuery wildcardQuery = new WildcardQuery(nterm);
		
		Term aterm = new Term("address", "nanjing");
		TermQuery termQuery = new TermQuery(aterm);
		
		BooleanQuery query = new BooleanQuery();
		query.add(wildcardQuery, BooleanClause.Occur.MUST); //should表示"或" must表示"必须"
		query.add(termQuery, BooleanClause.Occur.MUST);
		
		TopDocs topDocs = searcher.search(query, 10);
		*/
		 
		hits = topDocs.scoreDocs;
		
		for(int i = 0; i < hits.length; i++){
			Document doc = searcher.doc(hits[i].doc);
			//System.out.println(hits[i].score);
			System.out.print(doc.get("id") + " ");
			System.out.print(doc.get("name") + " ");
			System.out.print(doc.get("address") + " ");
			System.out.println(doc.get("birthday") + " ");
		}
		
		searcher.close();
		dir.close();
	}
}

下面我们来看一个全文索引的案例，data.txt 见文章最下面。首先我们建立对文章的索引：


package com.qianyan.lucene;
 
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
 
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
 
public class TestFileReaderForIndex{
 
	public static void main(String[] args) throws IOException{
		File file = new File("E:/data.txt");
		FileReader fRead = new FileReader(file);
		char[] chs = new char[60000];
		fRead.read(chs);
		
		String strtemp = new String(chs);
		String[] strs = strtemp.split("Database: Compendex");
		
		System.out.println(strs.length);
		for(int i = 0; i < strs.length; i++)
			strs[i] = strs[i].trim();
		
		Analyzer analyzer = new StandardAnalyzer();
		String indexDir = "E:/luceneindex";
		Directory dir = FSDirectory.getDirectory(indexDir);
		
		IndexWriter writer = new IndexWriter(dir, analyzer, false, IndexWriter.MaxFieldLength.UNLIMITED);
		
		for(int i = 0; i < strs.length; i++){
			Document document = new Document();
			document.add(new Field("contents", strs[i], Field.Store.YES, Field.Index.ANALYZED));
			writer.addDocument(document);
		}
		
		writer.optimize();
		writer.close();
		dir.close();
		System.out.println("index ok!");
	}
}

对上述追加索引进行简单搜索：


package com.qianyan.lucene;
 
import java.io.IOException;
 
import org.apache.lucene.document.Document;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
 
public class TestSeacher2 {
 
	public static void main(String[] args) throws IOException {
		String indexDir = "E:/luceneindex";
		Directory dir = FSDirectory.getDirectory(indexDir);
		IndexSearcher searcher = new IndexSearcher(dir);
		ScoreDoc[] hits = null;
		
		Term term = new Term("contents", "ontology");
		TermQuery query = new TermQuery(term);
		TopDocs topDocs = searcher.search(query, 126);
	
		hits = topDocs.scoreDocs;
		
		for(int i = 0; i < hits.length; i++){
			Document doc = searcher.doc(hits[i].doc);
			System.out.print(hits[i].score);
			System.out.println(doc.get("contents"));
		}
		
		searcher.close();
		dir.close();
	}
}

好了，简单的检索方式就介绍这些。

data.txt 内容如下：

1. Modeling the adsorption of CD(II) onto Muloorina illite and related clay minerals
Lackovic, Kurt (La Trobe University, P.O. Box 199, Bendigo, Vic. 3552, Australia) Angove, Michael J. Wells, John D. Johnson, Bruce B. Source: Journal of Colloid and Interface Science, v 257, n 1, p 31-40, 2003
Database: Compendex

2. Experimental Study of the Adsorption of an Ionic Liquid onto Bacterial and Mineral Surfaces
Gorman-Lewis, Drew J. (Civ. Eng. and Geological Sciences, University of Notre Dame, Notre Dame, IN 46556-0767, United States) Fein, Jeremy B. Source: Environmental Science and Technology, v 38, n 8, p 2491-2495, April 15, 2004
Database: Compendex

3. Grafting of hyperbranched polymers onto ultrafine silica: Postgraft polymerization of vinyl monomers initiated by pendant initiating groups of polymer chains grafted onto the surface
Hayashi, Shinji (Grad. Sch. of Science and Technology, Niigata Univ., 8050, I., Niigata, Japan) Fujiki, Kazuhiro Tsubokawa, Norio Source: Reactive and Functional Polymers, v 46, n 2, p 193-201, December 2000
Database: Compendex

4. The influence of pH, electrolyte type, and surface coating on arsenic(V) adsorption onto kaolinites
Cornu, Sophie (Unité de Sciences du Sol, INRA d'Orléans, av. de la Pomme de pin, Ardon, 45166 Olivet Cedex, France) Breeze, Dominique Saada, Alain Baranger, Philippe Source: Soil Science Society of America Journal, v 67, n 4, p 1127-1132, July/August 2003
Database: Compendex

5. Adsorption behavior of statherin and a statherin peptide onto hydroxyapatite and silica surfaces by in situ ellipsometry
Santos, Olga (Biomedical Laboratory Science and Biomedical Technology, Faculty of Health and Society, Malm? University, SE-20506 Malm?, Sweden) Kosoric, Jelena Hector, Mark Prichard Anderson, Paul Lindh, Liselott Source: Journal of Colloid and Interface Science, v 318, n 2, p 175-182, Febrary 15, 2008
Database: Compendex

6. Sorption of surfactant used in CO2 flooding onto five minerals and three porous media
Grigg, R.B. (SPE, New Mexico Recovery Research Center) Bai, B. Source: Proceedings - SPE International Symposium on Oilfield Chemistry, p 331-342, 2005, SPE International Symposium on Oilfield Chemistry Proceedings
Database: Compendex

7. Influence of charge density, sulfate group position and molecular mass on adsorption of chondroitin sulfate onto coral
Volpi, Nicola (Department of Animal Biology, Biological Chemistry, University of Modena and Reggio Emilia, Via Campi 213/d, 41100 Modena, Italy) Source: Biomaterials, v 23, n 14, p 3015-3022, 2002
Database: Compendex

8. Kinetic consequences of carbocationic grafting and blocking from and onto
Ivan, Bela (Univ of Akron, United States) Source: Polymer Bulletin, v 20, n 4, p 365-372, Oct
Database: Compendex

9. Assemblies of concanavalin A onto carboxymethylcellulose
Castro, Lizandra B.R. (Instituto de Química, Universidade de S?o Paulo, Av. Prof. Lineu Prestes 748, 05508-900, S?o Paulo, Brazil) Petri, Denise F.S. Source: Journal of Nanoscience and Nanotechnology, v 5, n 12, p 2063-2069, December 2005
Database: Compendex

10. Surface grafting of polymers onto glass plate: Polymerization of vinyl monomers initiated by initiating groups introduced onto the surface
Tsubokawa, Norio (Niigata Univ, Niigata, Japan) Satoh, Masayoshi Source: Journal of Applied Polymer Science, v 65, n 11, p 2165-2172, Sep 12
Database: Compendex

11. Photografting of vinyl polymers onto ultrafine inorganic particles: photopolymerization of vinyl monomers initiated by azo groups introduced onto these surfaces
Tsubokawa, Norio (Niigata Univ, Niigata, Japan) Shirai, Yukio Tsuchida, Hideyo Handa, Satoshi Source: Journal of Polymer Science, Part A: Polymer Chemistry, v 32, n 12, p 2327-2332, Sept
Database: Compendex

12. Graft polymerization of methyl methacrylate initiated by pendant azo groups introduced onto γ-poly (glutamic acid)
Tsubokawa, Norio (Niigata Univ, Niigita, Japan) Inagaki, Masatoshi Endo, Takeshi Source: Journal of Polymer Science, Part A: Polymer Chemistry, v 31, n 2, p 563-568, Feb
Database: Compendex

13. The sorpt

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/凡人多烦事01/article/detail/162818