当前位置:   article > 正文

【NLP】NLTK的安装和数据包的下载_nltk.download()

nltk.download()

1. 安装nltk

cmd中:

pip install nltk

2. 下载nltk数据包

python环境/编译器中

  1. import nltk
  2. nltk.download()

 

弹出一个自动的可交互下载框

 

选择all packages

download

 

但是速度很慢,据说需要两天可以完全下载

3. 补充下载失败的文件

记录下 download directory的路径位置,打开该路径文件夹

可以看到有下载好的文件

打开某个文件夹,可以看到下面有zip文件和解压缩后的文件

 

 

如果用nltk.download() 没有成功下载所有文件,重新运行该语句的时候总会报错“丢失链接、无法连接”等问题

 

去github下载文件

https://github.com/nltk/nltk_data


可以直接download整个工程

 

或者单独下载某个包的zip文件

https://github.com/nltk/nltk_data/tree/gh-pages/packages

 

或者 

nltk.download(‘punkt’)   

ps:可能也会丢失连接

 

将下载的zip文件放到本机对应的文件夹路径下

并解压缩即可

 

4. nltk使用示例代码

eg1:

  1. import nltk
  2. sen = 'hello, how are you?'
  3. res = nltk.word_tokenize(sen) #分词
  4. print(res)

 

eg2:

  1. text = "hello, how are you? I'm from China"
  2. tokens = nltk.word_tokenize(text) #分词
  3. tagged = nltk.pos_tag(tokens) #词性标注
  4. entities = nltk.chunk.ne_chunk(tagged) #命名实体识别
  5. a1=str(entities) #将文件转换为字符串
  6. file_object = open('out.txt', 'w')
  7. file_object.write(a1) #写入到文件中
  8. file_object.close( )
  9. print(entities)
  10. # 语法解析树
  11. from nltk.corpus import treebank
  12. t = treebank.parsed_sents('wsj_0001.mrg')[0]
  13. t.draw()

 

4. tips

如果运行示例报错,去github下载对应的加粗位置路径下的相应工具包再解压缩到本机即可

 

报错示例:

  1. Traceback (most recent call last):
  2.   File "D:\Users\xxxxx\AppData\Local\Anaconda3\lib\site-packages\nltk\corpus\util.py", line 80, in __load
  3.     try: root = nltk.data.find('{}/{}'.format(self.subdir, zip_name))
  4.   File "D:\Users\xxxxx\AppData\Local\Anaconda3\lib\site-packages\nltk\data.py", line 653, in find
  5.     raise LookupError(resource_not_found)
  6. LookupError: 
  7. **********************************************************************
  8.   Resource 'corpora/treebank.zip/treebank/combined/' not found.
  9.   Please use the NLTK Downloader to obtain the resource:  >>>
  10.   nltk.download()
  11.   Searched in:
  12.     - 'D:\\Users\\xxxxx/nltk_data'
  13.     - 'C:\\nltk_data'
  14.     - 'D:\\nltk_data'
  15.     - 'E:\\nltk_data'
  16.     - 'D:\\Users\\xxxxx\\pData\\Local\\Anaconda3\\nltk_data'
  17.     - 'D:\\Users\\xxxxx\\AppData\\Local\\Anaconda3\\lib\\nltk_data'
  18.     - 'D:\\Users\\xxxxx\\AppData\\Roaming\\nltk_data'
  19. **********************************************************************
  20. During handling of the above exception, another exception occurred:

 

按报错提示,需要下载 corpora/treebank.zip

 

5.参考资料:

https://github.com/nltk/nltk_data

https://www.cnblogs.com/guo7533/p/8695812.html

https://blog.csdn.net/sinat_34328764/article/details/94830948

https://blog.csdn.net/qiang12qiang12/article/details/81254595

https://www.osgeo.cn/nltk/data/

https://wing2south.com/post/speedup-ntlk-data-download/

https://blog.csdn.net/qq_43376013/article/details/102883773

https://blog.csdn.net/weixin_44574186/article/details/90748946

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/weixin_40725706/article/detail/344324
推荐阅读
相关标签
  

闽ICP备14008679号