Py之tiktoken:tiktoken的简介、安装、使用方法之详细攻略 - wpsshop博客
当前位置:   article > 正文

Py之tiktoken:tiktoken的简介、安装、使用方法之详细攻略

tiktoken

Py之tiktoken:tiktoken的简介、安装、使用方法之详细攻略

目录

tiktoken的简介

1、性能:tiktoken比一个类似的开源分词器快3到6倍

tiktoken的安装

tiktoken的使用方法

1、基础用法

(1)、用于OpenAI模型的快速BPE标记器

(2)、帮助可视化BPE过程的代码


tiktoken的简介

tiktoken是一个用于OpenAI模型的快速BPE标记器。

1、性能:tiktoken比一个类似的开源分词器快3到6倍

tiktoken的安装

  1. pip install tiktoken
  2. pip install -i https://pypi.tuna.tsinghua.edu.cn/simple tiktoken

  1. C:\Windows\system32>pip install -i https://pypi.tuna.tsinghua.edu.cn/simple tiktoken
  2. Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
  3. Collecting tiktoken
  4. Downloading https://pypi.tuna.tsinghua.edu.cn/packages/91/cf/7f3b821152f7abb240950133c60c394f7421a5791b020cedb190ff7a61b4/tiktoken-0.5.1-cp39-cp39-win_amd64.whl (760 kB)
  5. |████████████████████████████████| 760 kB 726 kB/s
  6. Requirement already satisfied: regex>=2022.1.18 in d:\programdata\anaconda3\lib\site-packages (from tiktoken) (2022.3.15)
  7. Requirement already satisfied: requests>=2.26.0 in d:\programdata\anaconda3\lib\site-packages (from tiktoken) (2.31.0)
  8. Requirement already satisfied: charset-normalizer<4,>=2 in d:\programdata\anaconda3\lib\site-packages (from requests>=2.26.0->tiktoken) (2.0.12)
  9. Requirement already satisfied: urllib3<3,>=1.21.1 in d:\programdata\anaconda3\lib\site-packages (from requests>=2.26.0->tiktoken) (1.26.9)
  10. Requirement already satisfied: idna<4,>=2.5 in d:\programdata\anaconda3\lib\site-packages (from requests>=2.26.0->tiktoken) (3.3)
  11. Requirement already satisfied: certifi>=2017.4.17 in d:\programdata\anaconda3\lib\site-packages (from requests>=2.26.0->tiktoken) (2021.10.8)
  12. Installing collected packages: tiktoken
  13. Successfully installed tiktoken-0.5.1

tiktoken的使用方法

1、基础用法

(1)、用于OpenAI模型的快速BPE标记器

  1. import tiktoken
  2. enc = tiktoken.get_encoding("cl100k_base")
  3. assert enc.decode(enc.encode("hello world")) == "hello world"
  4. # To get the tokeniser corresponding to a specific model in the OpenAI API:
  5. enc = tiktoken.encoding_for_model("gpt-4")

(2)、帮助可视化BPE过程的代码

  1. from tiktoken._educational import *
  2. # Train a BPE tokeniser on a small amount of text
  3. enc = train_simple_encoding()
  4. # Visualise how the GPT-4 encoder encodes text
  5. enc = SimpleBytePairEncoding.from_tiktoken("cl100k_base")
  6. enc.encode("hello world aaaaaaaaaaaa")

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小小林熬夜学编程/article/detail/345893?site
推荐阅读
相关标签