当前位置:   article > 正文

StanfordNLP for JAVA demo

standnlp java

最近工作需要,研究学习 NLP ,但是 苦于官方文档太过纷繁,容易找不到重点,于是打算自己写一份学习线路

NLP 路线图
好博客韩小阳
斯坦福NLP公开课
统计学习方法
好博客

  1. 链接地址:https://pan.baidu.com/s/1myVT-yMzqzJIcl50mGs2JA
  2. 提取密码:tw6r

参考文档:

StanfordNLPAPI

依照 印度小哥的 视频 跑了一个小 demo

step 1 用 IDEA 构建一个 maven 项目,引入 相关依赖包,当前依赖包最新版本为 3.9.2

  1. <dependency>
  2. <groupId>edu.stanford.nlp</groupId>
  3. <artifactId>stanford-corenlp</artifactId>
  4. <version>3.9.2</version>
  5. </dependency>
  6. <dependency>
  7. <groupId>edu.stanford.nlp</groupId>
  8. <artifactId>stanford-corenlp</artifactId>
  9. <version>3.9.2</version>
  10. <classifier>models</classifier>
  11. </dependency>
  12. <!--添加中文支持-->
  13. <dependency>
  14. <groupId>edu.stanford.nlp</groupId>
  15. <artifactId>stanford-corenlp</artifactId>
  16. <version>3.9.2</version>
  17. <classifier>models-chinese</classifier>
  18. </dependency>

step 2 使用 nlp 包

  1. package com.ghc.corhort.query.utils;
  2. import edu.stanford.nlp.coref.CorefCoreAnnotations;
  3. import edu.stanford.nlp.coref.data.CorefChain;
  4. import edu.stanford.nlp.ling.CoreAnnotations;
  5. import edu.stanford.nlp.ling.CoreLabel;
  6. import edu.stanford.nlp.pipeline.*;
  7. import edu.stanford.nlp.semgraph.SemanticGraph;
  8. import edu.stanford.nlp.semgraph.SemanticGraphCoreAnnotations;
  9. import edu.stanford.nlp.trees.Tree;
  10. import edu.stanford.nlp.trees.TreeCoreAnnotations;
  11. import edu.stanford.nlp.util.CoreMap;
  12. import java.util.*;
  13. /**
  14. * @author :Frank Li
  15. * @date :Created in 2019/8/7 13:39
  16. * @description:${description}
  17. * @modified By:
  18. * @version: $version$
  19. */
  20. public class Demo {
  21. public static void main(String[] args) {
  22. // creates a StanfordCoreNLP object, with POS tagging, lemmatization, NER, parsing, and coreference resolution
  23. Properties props = new Properties();
  24. props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
  25. StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
  26. // read some text in the text variable
  27. String text = "I like eat apple!";
  28. // create an empty Annotation just with the given text
  29. Annotation document = new Annotation(text);
  30. // run all Annotators on this text
  31. pipeline.annotate(document);
  32. // these are all the sentences in this document
  33. // a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
  34. List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
  35. for(CoreMap sentence: sentences) {
  36. // traversing the words in the current sentence
  37. // a CoreLabel is a CoreMap with additional token-specific methods
  38. for (CoreLabel token: sentence.get(CoreAnnotations.TokensAnnotation.class)) {
  39. // this is the text of the token
  40. String word = token.get(CoreAnnotations.TextAnnotation.class);
  41. // this is the POS tag of the token
  42. String pos = token.get(CoreAnnotations.PartOfSpeechAnnotation.class);
  43. // this is the NER label of the token
  44. String ne = token.get(CoreAnnotations.NamedEntityTagAnnotation.class);
  45. System.out.println("word:"+word+"-->pos:"+pos+"-->ne:"+ne);
  46. }
  47. // this is the parse tree of the current sentence
  48. Tree tree = sentence.get(TreeCoreAnnotations.TreeAnnotation.class);
  49. System.out.println(String.format("tree:\n%s",tree.toString()));
  50. // this is the Stanford dependency graph of the current sentence
  51. SemanticGraph dependencies = sentence.get(SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation.class);
  52. }
  53. // This is the coreference link graph
  54. // Each chain stores a set of mentions that link to each other,
  55. // along with a method for getting the most representative mention
  56. // Both sentence and token offsets start at 1!
  57. Map<Integer, CorefChain> graph =
  58. document.get(CorefCoreAnnotations.CorefChainAnnotation.class);
  59. }
  60. }

输出结果

636379-20190809105438436-1103174825.png

浅度原理

  1. stanford corenlp的TokensRegex
  2. 最近做一些音乐类、读物类的自然语言理解,就调研使用了下Stanford corenlp,记录下来。
  3. 功能
  4. Stanford Corenlp是一套自然语言分析工具集包括:
  5. POS(part of speech tagger)-标注词性
  6. NER(named entity recognizer)-实体名识别
  7. Parser树-分析句子的语法结构,如识别出短语词组、主谓宾等
  8. Coreference Resolution-指代消解,找出句子中代表同一个实体的词。下文的I/my,Nader/he表示的是同一个人
  9.   
  10. Sentiment Analysis-情感分析
  11. Bootstrapped pattern learning-自展的模式学习(也不知道翻译对不对,大概就是可以无监督的提取一些模式,如提取实体名)
  12. Open IE(Information Extraction)-从纯文本中提取有结构关系组,如"Barack Obama was born in Hawaii" =》 (Barack Obama; was born in; Hawaii)
  13. 需求
  14. 语音交互类的应用(如语音助手、智能音箱echo)收到的通常是口语化的自然语言,如:我想听一个段子,给我来个牛郎织女的故事,要想精确的返回结果,就需要提出有用的主题词,段子/牛郎织女/故事。看了一圈就想使用下corenlp的TokensRegex,基于tokens序列的正则表达式。因为它提供的可用的工具有:正则表达式、分词、词性、实体类别,另外还可以自己指定实体类别,如指定牛郎织女是READ类别的实体。

636379-20190815164318026-1818539398.png

636379-20190815164404354-1692846885.png

接下来要做 nlp2sql 的事情了

636379-20190823175606321-340205606.png

转载于:https://www.cnblogs.com/Frank99/p/11325835.html

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/凡人多烦事01/article/detail/599255
推荐阅读
相关标签
  

闽ICP备14008679号