IK Analyzer是基于lucene实现的分词开源框架
下载路径:http://so.csdn.net/so/search/s.do?q=IKAnalyzer2012.jar&t=doc&o=&s=all&l=null
需要在项目中引入:
IKAnalyzer2012.jar
lucene-core-3.6.0.jar
实现的两种方法:
使用(lucene)实现:
1 import java.io.IOException; 2 import java.io.StringReader; 3 import org.wltea.analyzer.core.IKSegmenter; 4 import org.wltea.analyzer.core.Lexeme; 5 6 public class Fenci1 { 7 public static void main(String[] args) throws IOException{ 8 String text="你好,我的世界!"; 9 StringReader sr=new StringReader(text); 10 IKSegmenter ik=new IKSegmenter(sr, true); 11 Lexeme lex=null; 12 while((lex=ik.next())!=null){ 13 System.out.print(lex.getLexemeText()+","); 14 } 15 } 16 17 }
使用(IK Analyzer)实现:
1 import java.io.IOException; 2 import java.io.StringReader; 3 import org.apache.lucene.analysis.Analyzer; 4 import org.apache.lucene.analysis.TokenStream; 5 import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; 6 import org.wltea.analyzer.lucene.IKAnalyzer; 7 8 public class Fenci { 9 public static void main(String[] args) throws IOException { 11 String text="你好,我的世界!"; 12 //创建分词对象 13 Analyzer anal=new IKAnalyzer(true); 14 StringReader reader=new StringReader(text); 15 //分词 16 TokenStream ts=anal.tokenStream("", reader); 17 CharTermAttribute term=ts.getAttribute(CharTermAttribute.class); 18 //遍历分词数据 19 while(ts.incrementToken()){ 20 System.out.print(term.toString()+","); 21 } 22 reader.close(); 23 System.out.println(); 24 } 25 26 }
运行后结果:
你好,我,的,世界,