当前位置:   article > 正文

tritonserver学习之九:tritonserver grpc异步模式_在异步协程中调用tritonclient

在异步协程中调用tritonclient

tritonserver学习之一:triton使用流程

tritonserver学习之二:tritonserver编译 

tritonserver学习之三:tritonserver运行流程

tritonserver学习之四:命令行解析

tritonserver学习之五:backend实现机制

tritonserver学习之六:自定义c++、python custom backend实践

tritonserver学习之七:cache管理器

tritonserver学习之八:redis_caches实践

1、tritonserver支持的协议

tritonserver成功将模型serve后,client端可以通过http或grpc协议请求到server端部署的模型,而对于grpc通信方式,系统选择了其异步模式,选择这种模式的原因主要有:

高并发:gRPC的异步模式允许服务器同时处理多个客户端请求,而不会因等待某个请求的响应而阻塞其他请求的处理。这使得TritonServer能够充分利用系统资源,提高并发性能,从而能够更高效地处理大量的模型推理请求。

资源利用率:在异步模式下,服务器不会为每个请求创建单独的线程或进程,而是将请求放入队列中,并通过事件循环机制来处理这些请求。这减少了系统资源的开销,使得TritonServer能够在有限的资源下处理更多的请求。

2、grpc异步模式

gRPC使用CompletionQueue API进行异步操作,基础工作流如下:

  • 构建CompletionQueue,并绑定到RPC调用。
  • 读写操作,使用一个唯一的void *指针(tag)标识。
  • 注册处理函数,通常以类对象指针作为唯一tag。
  • 调用CompletionQueue::Next,阻塞等待请求的到达。
  • 请求到达后,通过tag指针进行响应处理。

grpc异步模式启动主流程:

grpc 示例代码:

  1. void Run(uint16_t port) {
  2. std::string server_address = absl::StrFormat("0.0.0.0:%d", port);
  3. ServerBuilder builder;
  4. // Listen on the given address without any authentication mechanism.
  5. builder.AddListeningPort(server_address, grpc::InsecureServerCredentials());
  6. // Register "service_" as the instance through which we'll communicate with
  7. // clients. In this case it corresponds to an *asynchronous* service.
  8. builder.RegisterService(&service_);
  9. // Get hold of the completion queue used for the asynchronous communication
  10. // with the gRPC runtime.
  11. cq_ = builder.AddCompletionQueue();
  12. // Finally assemble the server.
  13. server_ = builder.BuildAndStart();
  14. std::cout << "Server listening on " << server_address << std::endl;
  15. // Proceed to the server's main loop.
  16. HandleRpcs();
  17. }

 注册处理函数:

service_->RequestSayHello(&ctx_, &request_, &responder_, cq_, cq_, this);

this即为唯一的tag,为指向该对象的指针。

另外要说的是,通过builder.AddCompletionQueue函数获得异步队列,一个系统中是可以有多个的,在triton中一共使用了三个异步队列,分别用于普通请求、推理请求、流式推理请求

server端完整示例代码:

  1. /*
  2. *
  3. * Copyright 2015 gRPC authors.
  4. *
  5. * Licensed under the Apache License, Version 2.0 (the "License");
  6. * you may not use this file except in compliance with the License.
  7. * You may obtain a copy of the License at
  8. *
  9. * http://www.apache.org/licenses/LICENSE-2.0
  10. *
  11. * Unless required by applicable law or agreed to in writing, software
  12. * distributed under the License is distributed on an "AS IS" BASIS,
  13. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  14. * See the License for the specific language governing permissions and
  15. * limitations under the License.
  16. *
  17. */
  18. #include <iostream>
  19. #include <memory>
  20. #include <string>
  21. #include <thread>
  22. #include "absl/flags/flag.h"
  23. #include "absl/flags/parse.h"
  24. #include "absl/strings/str_format.h"
  25. #include <grpc/support/log.h>
  26. #include <grpcpp/grpcpp.h>
  27. #ifdef BAZEL_BUILD
  28. #include "examples/protos/helloworld.grpc.pb.h"
  29. #else
  30. #include "helloworld.grpc.pb.h"
  31. #endif
  32. ABSL_FLAG(uint16_t, port, 50051, "Server port for the service");
  33. using grpc::Server;
  34. using grpc::ServerAsyncResponseWriter;
  35. using grpc::ServerBuilder;
  36. using grpc::ServerCompletionQueue;
  37. using grpc::ServerContext;
  38. using grpc::Status;
  39. using helloworld::Greeter;
  40. using helloworld::HelloReply;
  41. using helloworld::HelloRequest;
  42. class ServerImpl final {
  43. public:
  44. ~ServerImpl() {
  45. server_->Shutdown();
  46. // Always shutdown the completion queue after the server.
  47. cq_->Shutdown();
  48. }
  49. // There is no shutdown handling in this code.
  50. void Run(uint16_t port) {
  51. std::string server_address = absl::StrFormat("0.0.0.0:%d", port);
  52. ServerBuilder builder;
  53. // Listen on the given address without any authentication mechanism.
  54. builder.AddListeningPort(server_address, grpc::InsecureServerCredentials());
  55. // Register "service_" as the instance through which we'll communicate with
  56. // clients. In this case it corresponds to an *asynchronous* service.
  57. builder.RegisterService(&service_);
  58. // Get hold of the completion queue used for the asynchronous communication
  59. // with the gRPC runtime.
  60. cq_ = builder.AddCompletionQueue();
  61. // Finally assemble the server.
  62. server_ = builder.BuildAndStart();
  63. std::cout << "Server listening on " << server_address << std::endl;
  64. // Proceed to the server's main loop.
  65. HandleRpcs();
  66. }
  67. private:
  68. // Class encompasing the state and logic needed to serve a request.
  69. class CallData {
  70. public:
  71. // Take in the "service" instance (in this case representing an asynchronous
  72. // server) and the completion queue "cq" used for asynchronous communication
  73. // with the gRPC runtime.
  74. CallData(Greeter::AsyncService* service, ServerCompletionQueue* cq)
  75. : service_(service), cq_(cq), responder_(&ctx_), status_(CREATE) {
  76. // Invoke the serving logic right away.
  77. Proceed();
  78. }
  79. void Proceed() {
  80. if (status_ == CREATE) {
  81. // Make this instance progress to the PROCESS state.
  82. status_ = PROCESS;
  83. // As part of the initial CREATE state, we *request* that the system
  84. // start processing SayHello requests. In this request, "this" acts are
  85. // the tag uniquely identifying the request (so that different CallData
  86. // instances can serve different requests concurrently), in this case
  87. // the memory address of this CallData instance.
  88. service_->RequestSayHello(&ctx_, &request_, &responder_, cq_, cq_,
  89. this);
  90. } else if (status_ == PROCESS) {
  91. // Spawn a new CallData instance to serve new clients while we process
  92. // the one for this CallData. The instance will deallocate itself as
  93. // part of its FINISH state.
  94. new CallData(service_, cq_);
  95. // The actual processing.
  96. std::string prefix("Hello ");
  97. reply_.set_message(prefix + request_.name());
  98. // And we are done! Let the gRPC runtime know we've finished, using the
  99. // memory address of this instance as the uniquely identifying tag for
  100. // the event.
  101. status_ = FINISH;
  102. responder_.Finish(reply_, Status::OK, this);
  103. } else {
  104. GPR_ASSERT(status_ == FINISH);
  105. // Once in the FINISH state, deallocate ourselves (CallData).
  106. delete this;
  107. }
  108. }
  109. private:
  110. // The means of communication with the gRPC runtime for an asynchronous
  111. // server.
  112. Greeter::AsyncService* service_;
  113. // The producer-consumer queue where for asynchronous server notifications.
  114. ServerCompletionQueue* cq_;
  115. // Context for the rpc, allowing to tweak aspects of it such as the use
  116. // of compression, authentication, as well as to send metadata back to the
  117. // client.
  118. ServerContext ctx_;
  119. // What we get from the client.
  120. HelloRequest request_;
  121. // What we send back to the client.
  122. HelloReply reply_;
  123. // The means to get back to the client.
  124. ServerAsyncResponseWriter<HelloReply> responder_;
  125. // Let's implement a tiny state machine with the following states.
  126. enum CallStatus { CREATE, PROCESS, FINISH };
  127. CallStatus status_; // The current serving state.
  128. };
  129. // This can be run in multiple threads if needed.
  130. void HandleRpcs() {
  131. // Spawn a new CallData instance to serve new clients.
  132. new CallData(&service_, cq_.get());
  133. void* tag; // uniquely identifies a request.
  134. bool ok;
  135. while (true) {
  136. // Block waiting to read the next event from the completion queue. The
  137. // event is uniquely identified by its tag, which in this case is the
  138. // memory address of a CallData instance.
  139. // The return value of Next should always be checked. This return value
  140. // tells us whether there is any kind of event or cq_ is shutting down.
  141. GPR_ASSERT(cq_->Next(&tag, &ok));
  142. GPR_ASSERT(ok);
  143. static_cast<CallData*>(tag)->Proceed();
  144. }
  145. }
  146. std::unique_ptr<ServerCompletionQueue> cq_;
  147. Greeter::AsyncService service_;
  148. std::unique_ptr<Server> server_;
  149. };
  150. int main(int argc, char** argv) {
  151. absl::ParseCommandLine(argc, argv);
  152. ServerImpl server;
  153. server.Run(absl::GetFlag(FLAGS_port));
  154. return 0;
  155. }

示例代码github:https://github.com/grpc/grpc/tree/master/examples/cpp/helloworld

以上示例只是简单说明了grpc异步模式的使用方法,而对于处理多类请求的情况还需要优化设计,triton的设计是非常值得推荐的。 

3、triton grpc异步模式设计

triton中一共设计了三个异步队列,分别用于处理普通请求、推理请求、流式推理请求:

  1. std::unique_ptr<::grpc::ServerCompletionQueue> common_cq_; // 普通请求
  2. std::unique_ptr<::grpc::ServerCompletionQueue> model_infer_cq_; // 推理请求
  3. std::unique_ptr<::grpc::ServerCompletionQueue> model_stream_infer_cq_; // 流式推理请求

启动grpc服务代码位于【server】代码库main函数:

  1. TRITONSERVER_Error*
  2. StartGrpcService(
  3. std::unique_ptr<triton::server::grpc::Server>* service,
  4. const std::shared_ptr<TRITONSERVER_Server>& server,
  5. triton::server::TraceManager* trace_manager,
  6. const std::shared_ptr<triton::server::SharedMemoryManager>& shm_manager)
  7. {
  8. TRITONSERVER_Error* err = triton::server::grpc::Server::Create(
  9. server, trace_manager, shm_manager, g_triton_params.grpc_options_,
  10. service);
  11. if (err == nullptr) {
  12. err = (*service)->Start();
  13. }
  14. if (err != nullptr) {
  15. service->reset();
  16. }
  17. return err;
  18. }

其中(*service)->Start()函数为核心函数,实现了grpc请求的注册和处理,看如下代码(grpc_server.cc):

其中common_handler_->Start()为普通grpc请求的注册,model_infer_handler->Start为推理的注册,model_stream_infer_handler->Start为流式推理请求的注册,两个推理都出在一个循环中,这个循环标识的是在多个线程中注册函数,以便实现多线程的推理。

我们以common_handler为例继续看代码的实现:

  1. void
  2. CommonHandler::Start()
  3. {
  4. // Use a barrier to make sure we don't return until thread has
  5. // started.
  6. auto barrier = std::make_shared<Barrier>(2);
  7. // 启动一个线程,完成api的注册以及处理
  8. thread_.reset(new std::thread([this, barrier] {
  9. // 注册所有函数
  10. SetUpAllRequests();
  11. barrier->Wait();
  12. void* tag;
  13. bool ok;
  14. // 循环等待接收请求
  15. while (cq_->Next(&tag, &ok)) {
  16. ICallData* call_data = static_cast<ICallData*>(tag);
  17. if (!call_data->Process(ok)) {
  18. LOG_VERBOSE(1) << "Done for " << call_data->Name() << ", "
  19. << call_data->Id();
  20. delete call_data;
  21. }
  22. }
  23. }));
  24. barrier->Wait();
  25. LOG_VERBOSE(1) << "Thread started for " << Name();
  26. }

新启动的线程,完成所有api的注册,并循环等待rpc请求的到达,接收到请求后,将tag进行类型转换,同时调用其成员函数:Process()进行处理。其中类:ICallData为一个基类,这个类很重要,这里先列出,但不讲解。

继续看请求的注册,以健康检查注册为例:

  1. void
  2. CommonHandler::RegisterHealthCheck()
  3. {
  4. auto OnRegisterHealthCheck =
  5. [this](
  6. ::grpc::ServerContext* ctx,
  7. ::grpc::health::v1::HealthCheckRequest* request,
  8. ::grpc::ServerAsyncResponseWriter<
  9. ::grpc::health::v1::HealthCheckResponse>* responder,
  10. void* tag) {
  11. this->health_service_->RequestCheck(
  12. ctx, request, responder, this->cq_, this->cq_, tag);
  13. };
  14. auto OnExecuteHealthCheck = [this](
  15. ::grpc::health::v1::HealthCheckRequest&
  16. request,
  17. ::grpc::health::v1::HealthCheckResponse*
  18. response,
  19. ::grpc::Status* status) {
  20. bool live = false;
  21. TRITONSERVER_Error* err =
  22. TRITONSERVER_ServerIsReady(tritonserver_.get(), &live);
  23. auto serving_status =
  24. ::grpc::health::v1::HealthCheckResponse_ServingStatus_UNKNOWN;
  25. if (err == nullptr) {
  26. serving_status =
  27. live ? ::grpc::health::v1::HealthCheckResponse_ServingStatus_SERVING
  28. : ::grpc::health::v1::
  29. HealthCheckResponse_ServingStatus_NOT_SERVING;
  30. }
  31. response->set_status(serving_status);
  32. GrpcStatusUtil::Create(status, err);
  33. TRITONSERVER_ErrorDelete(err);
  34. };
  35. const std::pair<std::string, std::string>& restricted_kv =
  36. restricted_keys_.Get(RestrictedCategory::HEALTH);
  37. new CommonCallData<
  38. ::grpc::ServerAsyncResponseWriter<
  39. ::grpc::health::v1::HealthCheckResponse>,
  40. ::grpc::health::v1::HealthCheckRequest,
  41. ::grpc::health::v1::HealthCheckResponse>(
  42. "Check", 0, OnRegisterHealthCheck, OnExecuteHealthCheck,
  43. false /* async */, cq_, restricted_kv, response_delay_);
  44. }

这个函数中有三个重点:

  • OnRegisterHealthCheck变量,该变量为std::function变量,该变量实现了grpc异步api的注册。

  • OnExecuteHealthCheck变量,该变量为std::function变量,该变量为api的处理函数。

  • 创建CommonCallData对象,该对象真正实现了注册、处理请求的操作。

CommonCallData类的构造函数,会调用OnRegisterHealthCheck完成api的注册,在注册时,传入的tag为CommonCallData类对象指针,唯一标识了一个api请求,这个类继承自上面所说的ICallData类,在异步队列接收到请求数据后,会将tag强制转换为一个指向ICallData基类的指针,然而其真实类型为CommonCallData,接收到请求后,通过指针调用其成员函数Process对对应的请求进行处理:

  1. template <typename ResponderType, typename RequestType, typename ResponseType>
  2. bool
  3. CommonCallData<ResponderType, RequestType, ResponseType>::Process(bool rpc_ok)
  4. {
  5. LOG_VERBOSE(1) << "Process for " << name_ << ", rpc_ok=" << rpc_ok << ", "
  6. << id_ << " step " << step_;
  7. // If RPC failed on a new request then the server is shutting down
  8. // and so we should do nothing (including not registering for a new
  9. // request). If RPC failed on a non-START step then there is nothing
  10. // we can do since we one execute one step.
  11. const bool shutdown = (!rpc_ok && (step_ == Steps::START));
  12. if (shutdown) {
  13. if (async_thread_.joinable()) {
  14. async_thread_.join();
  15. }
  16. step_ = Steps::FINISH;
  17. }
  18. if (step_ == Steps::START) {
  19. // Start a new request to replace this one...
  20. if (!shutdown) {
  21. new CommonCallData<ResponderType, RequestType, ResponseType>(
  22. name_, id_ + 1, OnRegister_, OnExecute_, async_, cq_, restricted_kv_,
  23. response_delay_);
  24. }
  25. if (!async_) {
  26. // For synchronous calls, execute and write response
  27. // here.
  28. Execute();
  29. WriteResponse();
  30. } else {
  31. // For asynchronous calls, delegate the execution to another
  32. // thread.
  33. step_ = Steps::ISSUED;
  34. async_thread_ = std::thread(&CommonCallData::Execute, this);
  35. }
  36. } else if (step_ == Steps::WRITEREADY) {
  37. // Will only come here for asynchronous mode.
  38. WriteResponse();
  39. } else if (step_ == Steps::COMPLETE) {
  40. step_ = Steps::FINISH;
  41. }
  42. return step_ != Steps::FINISH;
  43. }

以上即为tritonserver grpc异步请求注册的全流程,欢迎各位程序员同学进行指正、讨论。

也非常欢迎同学们关注公众号进行沟通,一起学习,一起进步。

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/Cpp五条/article/detail/351806?site
推荐阅读
相关标签
  

闽ICP备14008679号