当前位置:   article > 正文

翻译的两种实现方式——基于transformers_opus-mt-en-zh

opus-mt-en-zh

1.基于pipeline 

(可加参数 device,默认为-1,使用cpu,如果非负,则表示指定哪块gpu)
  1. from transformers import (
  2. AutoTokenizer,
  3. AutoModelForSeq2SeqLM,
  4. pipeline
  5. )
  6. text = "从时间上看,中国空间站的建造比国际空间站晚20多年。"
  7. tokenizer = AutoTokenizer.from_pretrained("./Helsinki-NLP/opus-mt-zh-en")
  8. model = AutoModelForSeq2SeqLM.from_pretrained("./Helsinki-NLP/opus-mt-zh-en")
  9. tokenizer_back_translate = AutoTokenizer.from_pretrained("./Helsinki-NLP/opus-mt-en-zh")
  10. model_back_translate = AutoModelForSeq2SeqLM.from_pretrained("./Helsinki-NLP/opus-mt-en-zh")
  11. zh2en = pipeline("translation_zh_to_en", model=model, tokenizer=tokenizer)
  12. en2zh = pipeline("translation_en_to_zh", model=model_back_translate, tokenizer=tokenizer_back_translate)
  13. print("tran", zh2en(text[:5])[0]['translation_text'])
  14. print("tran_back", en2zh(zh2en(text[:5])[0]['translation_text'], max_length=510)[0]['translation_text'])

2.逐步实现

  1. batch = tokenizer.prepare_seq2seq_batch(src_texts=[text], return_tensors='pt', max_length=512)
  2. # Perform the translation and decode the output
  3. translation = model.generate(**batch)
  4. result = tokenizer.batch_decode(translation, skip_special_tokens=True)
  5. print("tran", result)
  6. batch_back_translate = tokenizer_back_translate.prepare_seq2seq_batch(src_texts=result, return_tensors='pt', max_length=512)
  7. # Perform the translation and decode the output
  8. translation_back_translate = model_back_translate.generate(**batch_back_translate)
  9. result = tokenizer_back_translate.batch_decode(translation_back_translate, skip_special_tokens=True)
  10. print("tran_back", result)

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/很楠不爱3/article/detail/599841
推荐阅读
相关标签
  

闽ICP备14008679号