当前位置:   article > 正文

实战whisper:本地化部署通用语音识别模型_java调用本地部署的语音识别模型whisper

java调用本地部署的语音识别模型whisper

前言

        Whisper 是一种通用语音识别模型。它是在大量不同音频数据集上进行训练的,也是一个多任务模型,可以执行多语言语音识别、语音翻译和语言识别。

        这里呢,我将给出我的一些代码,来帮助你尽快实现【语音转文字】的服务部署。

        以下是该AI模块的具体使用方式:

        https://github.com/openai/whisper

心得

        这是一个不错的语言模型,它支持自动识别语音语种,类似中文、英文、日语等它都能胜任,并且可以实现其他语种转英语翻译的功能,支持附加时间戳的字幕导出功能......

        总体来说,它甚至可以与市面上领头的语言识别功能相媲美,并且主要它是开源的。

        这是它的一些模型大小、需要的GPU显存、相对执行速度的对应表

         这是它在命令行模式下的使用方式,这对想要尝尝鲜的小伙伴们来说,已经够了

        tips:

        1、首次安装完毕whisper后,执行指令时会给你安装你所选的模型,small、medium等,我的显卡已经不支持我使用medium了 

        2、关于GPU版本的pytorch,可以参考如下教程(使用CPU版本会比较慢)

        https://blog.csdn.net/G541788_/article/details/135437236

python调用 

        作为一名python从业者,我十分幸运能够读懂一些模块的相关使用,这里我通过修改了一些模块源码调用,实现了在python代码中一键导出语音字幕的功能(这些功能在命令行中已拥有,但是我希望在使用python脚本model方法后再实现该功能,可能这些你并不需要,但随意吧)。

        这个模块的cli()方法或许能更好实现这一功能(因为命令行模式,其实就是运行了这个方法,但我根据经验和实际代码来看,这会重复加载model,导致不必要的资源损耗)。

         1、__init__.py中加入get_writer,让你能通过whisper模块去使用这个方法

from .transcribe import get_writer

        2、相关功能代码

  1. import os.path
  2. import whisper
  3. import time
  4. # 这是语种langue参数的解释,或许对你的选择有帮助
  5. LANGUAGES = {
  6. "en": "english",
  7. "zh": "chinese",
  8. "de": "german",
  9. "es": "spanish",
  10. "ru": "russian",
  11. "ko": "korean",
  12. "fr": "french",
  13. "ja": "japanese",
  14. "pt": "portuguese",
  15. "tr": "turkish",
  16. "pl": "polish",
  17. "ca": "catalan",
  18. "nl": "dutch",
  19. "ar": "arabic",
  20. "sv": "swedish",
  21. "it": "italian",
  22. "id": "indonesian",
  23. "hi": "hindi",
  24. "fi": "finnish",
  25. "vi": "vietnamese",
  26. "he": "hebrew",
  27. "uk": "ukrainian",
  28. "el": "greek",
  29. "ms": "malay",
  30. "cs": "czech",
  31. "ro": "romanian",
  32. "da": "danish",
  33. "hu": "hungarian",
  34. "ta": "tamil",
  35. "no": "norwegian",
  36. "th": "thai",
  37. "ur": "urdu",
  38. "hr": "croatian",
  39. "bg": "bulgarian",
  40. "lt": "lithuanian",
  41. "la": "latin",
  42. "mi": "maori",
  43. "ml": "malayalam",
  44. "cy": "welsh",
  45. "sk": "slovak",
  46. "te": "telugu",
  47. "fa": "persian",
  48. "lv": "latvian",
  49. "bn": "bengali",
  50. "sr": "serbian",
  51. "az": "azerbaijani",
  52. "sl": "slovenian",
  53. "kn": "kannada",
  54. "et": "estonian",
  55. "mk": "macedonian",
  56. "br": "breton",
  57. "eu": "basque",
  58. "is": "icelandic",
  59. "hy": "armenian",
  60. "ne": "nepali",
  61. "mn": "mongolian",
  62. "bs": "bosnian",
  63. "kk": "kazakh",
  64. "sq": "albanian",
  65. "sw": "swahili",
  66. "gl": "galician",
  67. "mr": "marathi",
  68. "pa": "punjabi",
  69. "si": "sinhala",
  70. "km": "khmer",
  71. "sn": "shona",
  72. "yo": "yoruba",
  73. "so": "somali",
  74. "af": "afrikaans",
  75. "oc": "occitan",
  76. "ka": "georgian",
  77. "be": "belarusian",
  78. "tg": "tajik",
  79. "sd": "sindhi",
  80. "gu": "gujarati",
  81. "am": "amharic",
  82. "yi": "yiddish",
  83. "lo": "lao",
  84. "uz": "uzbek",
  85. "fo": "faroese",
  86. "ht": "haitian creole",
  87. "ps": "pashto",
  88. "tk": "turkmen",
  89. "nn": "nynorsk",
  90. "mt": "maltese",
  91. "sa": "sanskrit",
  92. "lb": "luxembourgish",
  93. "my": "myanmar",
  94. "bo": "tibetan",
  95. "tl": "tagalog",
  96. "mg": "malagasy",
  97. "as": "assamese",
  98. "tt": "tatar",
  99. "haw": "hawaiian",
  100. "ln": "lingala",
  101. "ha": "hausa",
  102. "ba": "bashkir",
  103. "jw": "javanese",
  104. "su": "sundanese",
  105. "yue": "cantonese",
  106. }
  107. # 以下命令将使用medium模型转录音频文件中的语音:
  108. #
  109. # whisper audio.flac audio.mp3 audio.wav --model medium
  110. # 默认设置(选择模型small)非常适合转录英语。要转录包含非英语语音的音频文件,您可以使用以下选项指定语言--language:
  111. #
  112. # whisper japanese.wav --language Japanese
  113. # 添加--task translate会将演讲翻译成英语:
  114. #
  115. # whisper japanese.wav --language Japanese --task translate
  116. # 其他语言转录为英语
  117. # whisper "E:\voice\恋愛サーキュレーション_(Vocals)_(Vocals).wav" --language ja --task translate
  118. # 这个任务是将audio_files内的声音文件进行字幕导出,以时间戳为单位存储到captions/目录里
  119. audio_files = [r"E:\voice\恋愛サーキュレーション_(Vocals)_(Vocals).wav"]
  120. model = whisper.load_model("small")
  121. output_format = 'all'
  122. writer_args = {
  123. "highlight_words": False,
  124. "max_line_count": None,
  125. "max_line_width": None,
  126. "max_words_per_line": None,
  127. }
  128. for audio_file in audio_files:
  129. now_timestamp = str(int(time.time()))
  130. save_path = f'captions/{now_timestamp}'
  131. if not os.path.exists(save_path):
  132. os.mkdir(save_path)
  133. # language可选
  134. # 中文zh,日语ja,英语en
  135. result = model.transcribe(audio_file, language='ja')
  136. writer = whisper.get_writer(output_format, save_path)
  137. writer(result, audio_file, **writer_args)
  138. print('done: ', audio_file )

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小小林熬夜学编程/article/detail/623973
推荐阅读
相关标签
  

闽ICP备14008679号