赞
踩
从本文开始,我们将用两篇文章学习机器人语音交互。本文作为第一篇,将在ros上集成科大讯飞的中文语音库,实现语音控制机器人小车运动。至于语音识别和语音合成的原理,本文并不深究,读者可以自行搜索相关的文章介绍。这里提醒,本文的测试环境是ubuntu20.04 + ros noetic。
本文参考资料如下:
(1)《ROS机器人开发实践》胡春旭 第8章
(2)讯飞语音识别和唤醒开发示例
(3)讯飞语音听写 Linux SDK 文档
(4)ROS高效入门第二章 – 基本概念和常用命令学习,基于小乌龟样例
(5)ROS高效进阶第三章 – 以差速轮式机器人为例,使用Gazebo构建机器人仿真平台
(1)首先登陆讯飞开放平台:讯飞开放平台 ,注册后,点击控制台进入。
(2)然后创建应用并下载linux sdk,更具体的操作可以参考:讯飞语音识别和唤醒开发示例
(3)最后得到自己专属的sdk,如我本人的:Linux_iat1227_tts_online1227_bb839ccf.zip,其中 bb839ccf 是专属bb839ccf。下面我们将把这套 sdk 集成到 robot_voice 样例中,这里不对这个 zip 包内容进行展开讲解。
(1)robot_voice 样例,我们将实现两个应用,第一个就是本文的语音控制机器人小车移动,拓扑图如下:
voice_detector:负责语音识别,将语音转换为文字,并作为 client,通过 human_chatter 服务,发给 robot_controller 。
robot_controller:作为 human_chatter 服务 server,接收 voice_detector 发来的文字化的指令,并生成对应的语音播报文字和控车命令。前者通过 str2voice 服务,发给 voice_creator,后者通过 /cmd_vel topic,发给 mbot_gazebo。
voice_creator:作为 str2voice 服务server,接收 robot_controller 发来的语音播报文字,合成语音文件并播放。
mbot_gazebo:作为机器人小车,接收 /cmd_vel topic,并调整运动状态。
补充:关于ros的服务机制,可以参考本人ROS高效入门博客第二章的2.6节: ROS高效入门第二章 – 基本概念和常用命令学习,基于小乌龟样例
(2)安装环境:
unzip Linux_iat1227_tts_online1227_bb839ccf.zip
sudo cp libs/x64/libmsc.so /usr/lib/
sudo apt update
sudo apt install sox
sudo apt install libsox-fmt-all
SoX, 全称 Sound eXchange,被官方称为 “the Swiss Army knife of audio manipulation”。
它是一个强大的用于转换和处理声音文件的库。因其操作简单且功能强大,广泛应用在音频数据的处理和分析中。
除此之外,本人在编译讯飞样例时,遇到了:
linuxrec.c:12:10: fatal error: alsa/asoundlib.h: No such file or directory
解决方式是:
sudo apt-get install libasound2-dev
ALSA,全称Advanced Linux Sound Architecture (ALSA) 库,用于处理音频设备。
(3)创建 robot_voice 及相关文件
cd ~/catkin_ws/src
catkin_create_pkg robot_voice roscpp rospy std_msgs geometry_msgs message_generation message_runtime
cd robot_voice
mkdir srv launch
touch srv/StringToVoice.srv launch/voice_control_robot.launch
touch src/voice_detector.cpp src/robot_controller.cpp src/voice_creator.cpp
mkdir ifly_voice include/ifly_voice
请将 Linux_iat1227_tts_online1227_bb839ccf.zip 中的相关文件,分别移入相关目录,供编译使用,如下图:
(4)voice_detector.cpp
#include <ros/ros.h> #include <stdio.h> #include <signal.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include "ifly_voice/qisr.h" #include "ifly_voice/qtts.h" #include "ifly_voice/msp_cmn.h" #include "ifly_voice/formats.h" #include "ifly_voice/msp_errors.h" #include "ifly_voice/speech_recognizer.h" #include <std_msgs/String.h> #include <robot_voice/StringToVoice.h> class Helper { public: static void SignalHandler(int signal) { ROS_INFO("\nCaught signal %d. Exiting gracefully...\n", signal); exit(0); } }; class VoiceDetector { public: VoiceDetector() { ROS_INFO("VoiceDetector Constructor"); } ~VoiceDetector() { ROS_INFO("VoiceDetector Destructor"); } int Init() { int ret = MSP_SUCCESS; ret = MSPLogin(NULL, NULL, login_params_.c_str()); if (MSP_SUCCESS != ret) { ROS_ERROR("MSPLogin failed , Error code %d", ret); MSPLogout(); // Logout... return -1; } ROS_INFO("VoiceDetector MSP Login for update, waiting for seconds..."); return 0; } static void JoinTxt(const char *result, char is_last) { if (result) { std::string slice_txt = result; VoiceDetector::voice_txt_ += slice_txt; } if (is_last) { printf("voice txt : %s\n", VoiceDetector::voice_txt_.c_str()); } } static void InitSpeech() { VoiceDetector::voice_txt_ = ""; printf("clear cache, start Listening...\n"); } static void EndSpeech(int reason) { if (reason == END_REASON_VAD_DETECT) { printf("\nSpeaking done \n"); } else { printf("\nRecognizer error %d\n", reason); } } int SpeechOnce() { int ret; int i = 0; struct speech_rec iat; struct speech_rec_notifier recnotifier = { JoinTxt, InitSpeech, EndSpeech }; ret = sr_init(&iat, session_begin_params_.c_str(), SR_MIC, &recnotifier); if (ret) { ROS_ERROR("speech recognizer init failed"); return -1; } ret = sr_start_listening(&iat); if (ret) { printf("start listen failed %d\n", ret); } /* demo 15 seconds recording */ sleep(10); ret = sr_stop_listening(&iat); if (ret) { printf("stop listening failed %d\n", ret); } sr_uninit(&iat); return 0; } static std::string get_voice_txt_() { return voice_txt_; } private: const std::string login_params_ = "appid = bb839ccf, work_dir = ."; const std::string session_begin_params_ = "sub = iat, domain = iat, language = zh_cn, " "accent = mandarin, sample_rate = 16000, " "result_type = plain, result_encoding = utf8"; const uint32_t BufferSize = 4096; uint64_t g_buffersize = BufferSize; static std::string voice_txt_; }; std::string VoiceDetector::voice_txt_ = ""; int main(int argc, char* argv[]) { int ret = 0; ros::init(argc, argv, "voice_detector"); ros::NodeHandle nh; // 创建 human_chatter 服务client ros::ServiceClient client_ = nh.serviceClient<robot_voice::StringToVoice>("human_chatter"); if (signal(SIGINT, Helper::SignalHandler) == SIG_ERR) { return -1; } VoiceDetector vd; ret = vd.Init(); if (ret < 0) { return -1; } while (1) { // 一次聆听 ret = vd.SpeechOnce(); if (ret < 0) { return -1; } // 获取当次聆听得到的内容 std::string voice_txt = VoiceDetector::get_voice_txt_(); if (voice_txt == "") { printf("voice_txt is empty, do not send chatter\n"); continue; } else if (voice_txt.find("结束") != std::string::npos) { break; } // 通过 human_chatter 服务,发给robot_controller,处理成功后,进入下一轮 robot_voice::StringToVoice::Request req; robot_voice::StringToVoice::Response resp; req.data = voice_txt; bool ok = client_.call(req, resp); if (ok) { printf("send human_chatter service success: %s\n", req.data.c_str()); } else { printf("failed to send human_chatter service\n"); } } ros::spin(); return 0; }
(5)robot_controller.cpp
#include <stdio.h> #include <signal.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <ros/ros.h> #include <std_msgs/String.h> #include <geometry_msgs/Twist.h> #include <robot_voice/StringToVoice.h> class RobotController { public: RobotController() { ROS_INFO("RobotController Constructor"); } ~RobotController() { ROS_INFO("RobotController Destructor"); } int Init(ros::NodeHandle& nh) { cmd_pub_ = nh.advertise<geometry_msgs::Twist>("/cmd_vel", 1000); client_ = nh.serviceClient<robot_voice::StringToVoice>("str2voice"); return 0; } void ToDownstream(const std::string& answer_txt, float linear_x, float angular_z) { // 通过 str2voice 服务和 /cmd_vel topic向下游 voice_creator 和 mbot_gazebo 发送 robot_voice::StringToVoice::Request req; robot_voice::StringToVoice::Response resp; req.data = answer_txt; bool ok = client_.call(req, resp); if (ok) { printf("send str2voice service success: %s, and pub cmd_vel\n", req.data.c_str()); geometry_msgs::Twist msg; msg.linear.x = linear_x; msg.angular.z = angular_z; cmd_pub_.publish(msg); } else { ROS_ERROR("failed to send str2voice service"); } } bool ChatterCallbback(robot_voice::StringToVoice::Request &req, robot_voice::StringToVoice::Response &resp) { printf("i received: %s\n", req.data.c_str()); std::string voice_txt = req.data; // 根据指令关键字,发送对应的语音播包文字和 cmd_vel 命令 if (voice_txt.find("前") != std::string::npos) { ToDownstream("小车请向前跑", 0.3, 0); } else if (voice_txt.find("后") != std::string::npos) { ToDownstream("小车请向后倒", -0.3, 0); } else if (voice_txt.find("左") != std::string::npos) { ToDownstream("小车请向左转", 0, 0.3); } else if (voice_txt.find("右") != std::string::npos) { ToDownstream("小车请向右转", 0, -0.3); } else if (voice_txt.find("转") != std::string::npos) { ToDownstream("小车请打转", 0.3, -0.3); } resp.success = true; return resp.success; } void Start(ros::NodeHandle& nh) { // 申明 human_chatter 服务,ChatterCallbback是回调函数 chatter_server_ = nh.advertiseService("human_chatter", &RobotController::ChatterCallbback, this); } private: ros::ServiceServer chatter_server_; ros::Publisher cmd_pub_; ros::ServiceClient client_; }; int main(int argc, char* argv[]) { int ret = 0; ros::init(argc, argv, "voice_controller"); ros::NodeHandle nh; RobotController rc; rc.Init(nh); printf("this is a voice controller app for robot, you can say: 向前, 向后, 向左, 向右, 转圈, 结束\n"); rc.Start(nh); ros::spin(); return 0; }
(6)voice_creator.cpp
#include <ros/ros.h> #include <stdio.h> #include <signal.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include "ifly_voice/qisr.h" #include "ifly_voice/qtts.h" #include "ifly_voice/msp_cmn.h" #include "ifly_voice/formats.h" #include "ifly_voice/msp_errors.h" #include "ifly_voice/speech_recognizer.h" #include <robot_voice/StringToVoice.h> class Helper { public: static void SignalHandler(int signal) { ROS_INFO("\nCaught signal %d. Exiting gracefully...\n", signal); exit(0); } }; class VoiceCreator { public: VoiceCreator() { ROS_INFO("VoiceCreator Constructor"); } ~VoiceCreator() { ROS_INFO("VoiceCreator Destructor "); } int Init() { int ret = MSP_SUCCESS; ret = MSPLogin(NULL, NULL, login_params_.c_str()); if (MSP_SUCCESS != ret) { ROS_ERROR("MSPLogin failed , Error code %d", ret); MSPLogout(); // Logout... return -1; } ROS_INFO("VoiceCreator MSP Login for update, waiting for seconds..."); return 0; } int ProcessTxt(std::string& txt) { int ret = -1; FILE* fp = NULL; const char* sessionID = NULL; unsigned int audio_len = 0; int synth_status = MSP_TTS_FLAG_STILL_HAVE_DATA; WavePcmHdr_t wav_hdr = { { 'R', 'I', 'F', 'F' }, 0, {'W', 'A', 'V', 'E'}, {'f', 'm', 't', ' '}, 16, 1, 1, 16000, 32000, 2, 16, {'d', 'a', 't', 'a'}, 0 }; const char* src_text = txt.c_str(); const char* des_path = filename_.c_str(); const char* params = session_begin_params_.c_str(); if (NULL == src_text || NULL == des_path) { ROS_ERROR("params is error!"); return ret; } fp = fopen(des_path, "wb"); if (NULL == fp) { ROS_ERROR("open %s error", des_path); return ret; } /* 开始合成 */ sessionID = QTTSSessionBegin(params, &ret); if (MSP_SUCCESS != ret) { ROS_ERROR("QTTSSessionBegin failed, error code: %d", ret); fclose(fp); return ret; } ret = QTTSTextPut(sessionID, src_text, (unsigned int)strlen(src_text), NULL); if (MSP_SUCCESS != ret) { ROS_ERROR("QTTSTextPut failed, error code: %d",ret); QTTSSessionEnd(sessionID, "TextPutError"); fclose(fp); return ret; } printf("正在合成 ...\n"); fwrite(&wav_hdr, sizeof(wav_hdr) ,1, fp); //添加wav音频头,使用采样率为16000 while (1) { /* 获取合成音频 */ const void* data = QTTSAudioGet(sessionID, &audio_len, &synth_status, &ret); if (MSP_SUCCESS != ret) { break; } if (NULL != data) { fwrite(data, audio_len, 1, fp); wav_hdr.data_size += audio_len; //计算data_size大小 } if (MSP_TTS_FLAG_DATA_END == synth_status) { break; } printf(">"); usleep(150*1000); //防止频繁占用CPU } printf("\n"); if (MSP_SUCCESS != ret) { ROS_ERROR("QTTSAudioGet failed, error code: %d",ret); QTTSSessionEnd(sessionID, "AudioGetError"); fclose(fp); return ret; } /* 修正wav文件头数据的大小 */ wav_hdr.size_8 += wav_hdr.data_size + (sizeof(wav_hdr) - 8); /* 将修正过的数据写回文件头部,音频文件为wav格式 */ fseek(fp, 4, 0); fwrite(&wav_hdr.size_8,sizeof(wav_hdr.size_8), 1, fp); //写入size_8的值 fseek(fp, 40, 0); //将文件指针偏移到存储data_size值的位置 fwrite(&wav_hdr.data_size,sizeof(wav_hdr.data_size), 1, fp); //写入data_size的值 fclose(fp); fp = NULL; /* 合成完毕 */ ret = QTTSSessionEnd(sessionID, "Normal"); if (MSP_SUCCESS != ret) { ROS_ERROR("QTTSSessionEnd failed, error code: %d", ret); return ret; } // 播放语音文件 fp = popen(play_cmd_.c_str(),"r"); if (fp == NULL) { ROS_ERROR("play /tmp/tts_sample.wav failed"); return -1; } sleep(1); pclose(fp); return 0; } bool Speeking(robot_voice::StringToVoice::Request &req, robot_voice::StringToVoice::Response &resp) { int ret = -1; ret = ProcessTxt(req.data); if (MSP_SUCCESS != ret) { ROS_ERROR("AnswerVoice failed, error code: %d", ret); resp.success = false; return false; } else { resp.success = true; } return resp.success; } void Start(ros::NodeHandle& nh) { // 申明 str2voice 服务 server_ = nh.advertiseService("str2voice", &VoiceCreator::Speeking, this); } private: ros::ServiceServer server_; const std::string login_params_ = "appid = bb839ccf, work_dir = ."; const std::string session_begin_params_ = "voice_name = xiaoyan, text_encoding = utf8, " "sample_rate = 16000, speed = 50, volume = 50, " "pitch = 50, rdn = 2"; //合成的语音文件名称 const std::string filename_ = "/tmp/tts_sample.wav"; //语音播放命令 const std::string play_cmd_ = "play /tmp/tts_sample.wav"; /* wav音频头部格式 */ typedef struct WavePcmHdr { char riff[4]; // = "RIFF" int size_8; // = FileSize - 8 char wave[4]; // = "WAVE" char fmt[4]; // = "fmt " int fmt_size; // = 下一个结构体的大小 : 16 short int format_tag; // = PCM : 1 short int channels; // = 通道数 : 1 int samples_per_sec; // = 采样率 : 8000 | 6000 | 11025 | 16000 int avg_bytes_per_sec; // = 每秒字节数 : samples_per_sec * bits_per_sample / 8 short int block_align; // = 每采样点字节数 : wBitsPerSample / 8 short int bits_per_sample; // = 量化比特数: 8 | 16 char data[4]; // = "data"; int data_size; // = 纯数据长度 : FileSize - 44 } WavePcmHdr_t; }; int main(int argc, char ** argv) { int ret = 0; ros::init(argc, argv, "voice_creator"); ros::NodeHandle nh; if (signal(SIGINT, Helper::SignalHandler) == SIG_ERR) { return -1; } VoiceCreator vc; ret = vc.Init(); if (ret < 0) { return -1; } vc.Start(nh); ros::spin(); return 0; }
(6)StringToVoice.srv , voice_control_robot.launch 和 CMakeLists.txt
StringToVoice.srv
string data
---
bool success
voice_control_robot.launch
<launch> <node pkg="robot_voice" type="voice_creator" name="voice_creator" output="screen" /> <node pkg="robot_voice" type="robot_controller" name="robot_controller" output="screen" /> <node pkg="robot_voice" type="voice_detector" name="voice_detector" launch-prefix="bash -c 'sleep 5; $0 $@'" output="screen" /> </launch>
CMakeLists.txt
cmake_minimum_required(VERSION 3.0.2) project(robot_voice) add_compile_options(-std=c++11) find_package(catkin REQUIRED COMPONENTS roscpp rospy std_msgs geometry_msgs message_generation ) add_service_files( FILES StringToVoice.srv ) generate_messages( DEPENDENCIES std_msgs ) catkin_package( CATKIN_DEPENDS message_runtime roscpp rospy std_msgs ) include_directories( include ${catkin_INCLUDE_DIRS} ) add_executable(voice_detector src/voice_detector.cpp ifly_voice/speech_recognizer.c ifly_voice/linuxrec.c) add_executable(robot_controller src/robot_controller.cpp) add_executable(voice_creator src/voice_creator.cpp) add_dependencies(voice_detector ${PROJECT_NAME}_generate_messages_cpp) target_link_libraries(voice_detector ${catkin_LIBRARIES} libmsc.so -ldl -lpthread -lm -lrt -lasound ) add_dependencies(robot_controller ${PROJECT_NAME}_generate_messages_cpp) target_link_libraries(robot_controller ${catkin_LIBRARIES} ) add_dependencies(voice_creator ${PROJECT_NAME}_generate_messages_cpp) target_link_libraries(voice_creator ${catkin_LIBRARIES} libmsc.so -ldl -pthread )
(7)编译并运行(运行时请注意电脑网络通畅!)
cd ~/catkin_ws/
catkin_make -DCATKIN_WHITELIST_PACKAGES="robot_voice;mbot_gazebo"
source devel/setup.bash
roslaunch mbot_gazebo view_mbot_gazebo.launch
// 再开一个窗口
source devel/setup.bash
roslaunch robot_voice voice_control_robot.launch
语音控制机器人
(8)在开发调试过程中,出现了如下编译报错:
internal compiler error: Illegal instruction
不得已,更新了gcc版本,问题解决
sudo apt-get install gcc-10
sudo apt-get install g++-10
cd /usr/bin
sudo rm gcc g++
sudo ln -s gcc-10 gcc
sudo ln -s g++-10 g++
本文的样例托管在本人的github上:robot_voice
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。