赞
踩
搜了半天的Bert文本对齐方法
发现还没Huggingface的transformers里的方法好用
- from transformers import BertTokenizer
- tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
-
- sequence_a = "This is a short sequence."
- sequence_b = "This is a rather long sequence. It is at least longer than the sequence A."
-
- padded_sequences = tokenizer([sequence_a, sequence_b], padding=True)
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。