赞
踩
tritonserver学习之二:tritonserver编译
tritonserver学习之三:tritonserver运行流程
tritonserver学习之六:自定义c++、python custom backend实践
tritonserver学习之八:redis_caches实践
tritonserver成功将模型serve后,client端可以通过http或grpc协议请求到server端部署的模型,而对于grpc通信方式,系统选择了其异步模式,选择这种模式的原因主要有:
高并发:gRPC的异步模式允许服务器同时处理多个客户端请求,而不会因等待某个请求的响应而阻塞其他请求的处理。这使得TritonServer能够充分利用系统资源,提高并发性能,从而能够更高效地处理大量的模型推理请求。
资源利用率:在异步模式下,服务器不会为每个请求创建单独的线程或进程,而是将请求放入队列中,并通过事件循环机制来处理这些请求。这减少了系统资源的开销,使得TritonServer能够在有限的资源下处理更多的请求。
gRPC使用CompletionQueue API进行异步操作,基础工作流如下:
grpc异步模式启动主流程:
grpc 示例代码:
- void Run(uint16_t port) {
- std::string server_address = absl::StrFormat("0.0.0.0:%d", port);
-
- ServerBuilder builder;
- // Listen on the given address without any authentication mechanism.
- builder.AddListeningPort(server_address, grpc::InsecureServerCredentials());
- // Register "service_" as the instance through which we'll communicate with
- // clients. In this case it corresponds to an *asynchronous* service.
- builder.RegisterService(&service_);
- // Get hold of the completion queue used for the asynchronous communication
- // with the gRPC runtime.
- cq_ = builder.AddCompletionQueue();
- // Finally assemble the server.
- server_ = builder.BuildAndStart();
- std::cout << "Server listening on " << server_address << std::endl;
-
- // Proceed to the server's main loop.
- HandleRpcs();
- }
注册处理函数:
service_->RequestSayHello(&ctx_, &request_, &responder_, cq_, cq_, this);
this即为唯一的tag,为指向该对象的指针。
另外要说的是,通过builder.AddCompletionQueue函数获得异步队列,一个系统中是可以有多个的,在triton中一共使用了三个异步队列,分别用于普通请求、推理请求、流式推理请求。
server端完整示例代码:
- /*
- *
- * Copyright 2015 gRPC authors.
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- *
- */
-
- #include <iostream>
- #include <memory>
- #include <string>
- #include <thread>
-
- #include "absl/flags/flag.h"
- #include "absl/flags/parse.h"
- #include "absl/strings/str_format.h"
-
- #include <grpc/support/log.h>
- #include <grpcpp/grpcpp.h>
-
- #ifdef BAZEL_BUILD
- #include "examples/protos/helloworld.grpc.pb.h"
- #else
- #include "helloworld.grpc.pb.h"
- #endif
-
- ABSL_FLAG(uint16_t, port, 50051, "Server port for the service");
-
- using grpc::Server;
- using grpc::ServerAsyncResponseWriter;
- using grpc::ServerBuilder;
- using grpc::ServerCompletionQueue;
- using grpc::ServerContext;
- using grpc::Status;
- using helloworld::Greeter;
- using helloworld::HelloReply;
- using helloworld::HelloRequest;
-
- class ServerImpl final {
- public:
- ~ServerImpl() {
- server_->Shutdown();
- // Always shutdown the completion queue after the server.
- cq_->Shutdown();
- }
-
- // There is no shutdown handling in this code.
- void Run(uint16_t port) {
- std::string server_address = absl::StrFormat("0.0.0.0:%d", port);
-
- ServerBuilder builder;
- // Listen on the given address without any authentication mechanism.
- builder.AddListeningPort(server_address, grpc::InsecureServerCredentials());
- // Register "service_" as the instance through which we'll communicate with
- // clients. In this case it corresponds to an *asynchronous* service.
- builder.RegisterService(&service_);
- // Get hold of the completion queue used for the asynchronous communication
- // with the gRPC runtime.
- cq_ = builder.AddCompletionQueue();
- // Finally assemble the server.
- server_ = builder.BuildAndStart();
- std::cout << "Server listening on " << server_address << std::endl;
-
- // Proceed to the server's main loop.
- HandleRpcs();
- }
-
- private:
- // Class encompasing the state and logic needed to serve a request.
- class CallData {
- public:
- // Take in the "service" instance (in this case representing an asynchronous
- // server) and the completion queue "cq" used for asynchronous communication
- // with the gRPC runtime.
- CallData(Greeter::AsyncService* service, ServerCompletionQueue* cq)
- : service_(service), cq_(cq), responder_(&ctx_), status_(CREATE) {
- // Invoke the serving logic right away.
- Proceed();
- }
-
- void Proceed() {
- if (status_ == CREATE) {
- // Make this instance progress to the PROCESS state.
- status_ = PROCESS;
-
- // As part of the initial CREATE state, we *request* that the system
- // start processing SayHello requests. In this request, "this" acts are
- // the tag uniquely identifying the request (so that different CallData
- // instances can serve different requests concurrently), in this case
- // the memory address of this CallData instance.
- service_->RequestSayHello(&ctx_, &request_, &responder_, cq_, cq_,
- this);
- } else if (status_ == PROCESS) {
- // Spawn a new CallData instance to serve new clients while we process
- // the one for this CallData. The instance will deallocate itself as
- // part of its FINISH state.
- new CallData(service_, cq_);
-
- // The actual processing.
- std::string prefix("Hello ");
- reply_.set_message(prefix + request_.name());
-
- // And we are done! Let the gRPC runtime know we've finished, using the
- // memory address of this instance as the uniquely identifying tag for
- // the event.
- status_ = FINISH;
- responder_.Finish(reply_, Status::OK, this);
- } else {
- GPR_ASSERT(status_ == FINISH);
- // Once in the FINISH state, deallocate ourselves (CallData).
- delete this;
- }
- }
-
- private:
- // The means of communication with the gRPC runtime for an asynchronous
- // server.
- Greeter::AsyncService* service_;
- // The producer-consumer queue where for asynchronous server notifications.
- ServerCompletionQueue* cq_;
- // Context for the rpc, allowing to tweak aspects of it such as the use
- // of compression, authentication, as well as to send metadata back to the
- // client.
- ServerContext ctx_;
-
- // What we get from the client.
- HelloRequest request_;
- // What we send back to the client.
- HelloReply reply_;
-
- // The means to get back to the client.
- ServerAsyncResponseWriter<HelloReply> responder_;
-
- // Let's implement a tiny state machine with the following states.
- enum CallStatus { CREATE, PROCESS, FINISH };
- CallStatus status_; // The current serving state.
- };
-
- // This can be run in multiple threads if needed.
- void HandleRpcs() {
- // Spawn a new CallData instance to serve new clients.
- new CallData(&service_, cq_.get());
- void* tag; // uniquely identifies a request.
- bool ok;
- while (true) {
- // Block waiting to read the next event from the completion queue. The
- // event is uniquely identified by its tag, which in this case is the
- // memory address of a CallData instance.
- // The return value of Next should always be checked. This return value
- // tells us whether there is any kind of event or cq_ is shutting down.
- GPR_ASSERT(cq_->Next(&tag, &ok));
- GPR_ASSERT(ok);
- static_cast<CallData*>(tag)->Proceed();
- }
- }
-
- std::unique_ptr<ServerCompletionQueue> cq_;
- Greeter::AsyncService service_;
- std::unique_ptr<Server> server_;
- };
-
- int main(int argc, char** argv) {
- absl::ParseCommandLine(argc, argv);
- ServerImpl server;
- server.Run(absl::GetFlag(FLAGS_port));
-
- return 0;
- }
示例代码github:https://github.com/grpc/grpc/tree/master/examples/cpp/helloworld
以上示例只是简单说明了grpc异步模式的使用方法,而对于处理多类请求的情况还需要优化设计,triton的设计是非常值得推荐的。
triton中一共设计了三个异步队列,分别用于处理普通请求、推理请求、流式推理请求:
- std::unique_ptr<::grpc::ServerCompletionQueue> common_cq_; // 普通请求
- std::unique_ptr<::grpc::ServerCompletionQueue> model_infer_cq_; // 推理请求
- std::unique_ptr<::grpc::ServerCompletionQueue> model_stream_infer_cq_; // 流式推理请求
启动grpc服务代码位于【server】代码库main函数:
- TRITONSERVER_Error*
- StartGrpcService(
- std::unique_ptr<triton::server::grpc::Server>* service,
- const std::shared_ptr<TRITONSERVER_Server>& server,
- triton::server::TraceManager* trace_manager,
- const std::shared_ptr<triton::server::SharedMemoryManager>& shm_manager)
- {
- TRITONSERVER_Error* err = triton::server::grpc::Server::Create(
- server, trace_manager, shm_manager, g_triton_params.grpc_options_,
- service);
- if (err == nullptr) {
- err = (*service)->Start();
- }
-
- if (err != nullptr) {
- service->reset();
- }
-
- return err;
- }
其中(*service)->Start()函数为核心函数,实现了grpc请求的注册和处理,看如下代码(grpc_server.cc):
其中common_handler_->Start()为普通grpc请求的注册,model_infer_handler->Start为推理的注册,model_stream_infer_handler->Start为流式推理请求的注册,两个推理都出在一个循环中,这个循环标识的是在多个线程中注册函数,以便实现多线程的推理。
我们以common_handler为例继续看代码的实现:
- void
- CommonHandler::Start()
- {
- // Use a barrier to make sure we don't return until thread has
- // started.
- auto barrier = std::make_shared<Barrier>(2);
- // 启动一个线程,完成api的注册以及处理
- thread_.reset(new std::thread([this, barrier] {
- // 注册所有函数
- SetUpAllRequests();
- barrier->Wait();
-
- void* tag;
- bool ok;
-
- // 循环等待接收请求
- while (cq_->Next(&tag, &ok)) {
- ICallData* call_data = static_cast<ICallData*>(tag);
- if (!call_data->Process(ok)) {
- LOG_VERBOSE(1) << "Done for " << call_data->Name() << ", "
- << call_data->Id();
- delete call_data;
- }
- }
- }));
-
- barrier->Wait();
- LOG_VERBOSE(1) << "Thread started for " << Name();
- }
新启动的线程,完成所有api的注册,并循环等待rpc请求的到达,接收到请求后,将tag进行类型转换,同时调用其成员函数:Process()进行处理。其中类:ICallData为一个基类,这个类很重要,这里先列出,但不讲解。
继续看请求的注册,以健康检查注册为例:
- void
- CommonHandler::RegisterHealthCheck()
- {
- auto OnRegisterHealthCheck =
- [this](
- ::grpc::ServerContext* ctx,
- ::grpc::health::v1::HealthCheckRequest* request,
- ::grpc::ServerAsyncResponseWriter<
- ::grpc::health::v1::HealthCheckResponse>* responder,
- void* tag) {
- this->health_service_->RequestCheck(
- ctx, request, responder, this->cq_, this->cq_, tag);
- };
-
- auto OnExecuteHealthCheck = [this](
- ::grpc::health::v1::HealthCheckRequest&
- request,
- ::grpc::health::v1::HealthCheckResponse*
- response,
- ::grpc::Status* status) {
- bool live = false;
- TRITONSERVER_Error* err =
- TRITONSERVER_ServerIsReady(tritonserver_.get(), &live);
-
- auto serving_status =
- ::grpc::health::v1::HealthCheckResponse_ServingStatus_UNKNOWN;
- if (err == nullptr) {
- serving_status =
- live ? ::grpc::health::v1::HealthCheckResponse_ServingStatus_SERVING
- : ::grpc::health::v1::
- HealthCheckResponse_ServingStatus_NOT_SERVING;
- }
- response->set_status(serving_status);
-
- GrpcStatusUtil::Create(status, err);
- TRITONSERVER_ErrorDelete(err);
- };
-
- const std::pair<std::string, std::string>& restricted_kv =
- restricted_keys_.Get(RestrictedCategory::HEALTH);
- new CommonCallData<
- ::grpc::ServerAsyncResponseWriter<
- ::grpc::health::v1::HealthCheckResponse>,
- ::grpc::health::v1::HealthCheckRequest,
- ::grpc::health::v1::HealthCheckResponse>(
- "Check", 0, OnRegisterHealthCheck, OnExecuteHealthCheck,
- false /* async */, cq_, restricted_kv, response_delay_);
- }
这个函数中有三个重点:
OnRegisterHealthCheck变量,该变量为std::function变量,该变量实现了grpc异步api的注册。
OnExecuteHealthCheck变量,该变量为std::function变量,该变量为api的处理函数。
创建CommonCallData对象,该对象真正实现了注册、处理请求的操作。
CommonCallData类的构造函数,会调用OnRegisterHealthCheck完成api的注册,在注册时,传入的tag为CommonCallData类对象指针,唯一标识了一个api请求,这个类继承自上面所说的ICallData类,在异步队列接收到请求数据后,会将tag强制转换为一个指向ICallData基类的指针,然而其真实类型为CommonCallData,接收到请求后,通过指针调用其成员函数Process对对应的请求进行处理:
- template <typename ResponderType, typename RequestType, typename ResponseType>
- bool
- CommonCallData<ResponderType, RequestType, ResponseType>::Process(bool rpc_ok)
- {
- LOG_VERBOSE(1) << "Process for " << name_ << ", rpc_ok=" << rpc_ok << ", "
- << id_ << " step " << step_;
-
- // If RPC failed on a new request then the server is shutting down
- // and so we should do nothing (including not registering for a new
- // request). If RPC failed on a non-START step then there is nothing
- // we can do since we one execute one step.
- const bool shutdown = (!rpc_ok && (step_ == Steps::START));
- if (shutdown) {
- if (async_thread_.joinable()) {
- async_thread_.join();
- }
- step_ = Steps::FINISH;
- }
-
- if (step_ == Steps::START) {
- // Start a new request to replace this one...
- if (!shutdown) {
- new CommonCallData<ResponderType, RequestType, ResponseType>(
- name_, id_ + 1, OnRegister_, OnExecute_, async_, cq_, restricted_kv_,
- response_delay_);
- }
-
- if (!async_) {
- // For synchronous calls, execute and write response
- // here.
- Execute();
- WriteResponse();
- } else {
- // For asynchronous calls, delegate the execution to another
- // thread.
- step_ = Steps::ISSUED;
- async_thread_ = std::thread(&CommonCallData::Execute, this);
- }
- } else if (step_ == Steps::WRITEREADY) {
- // Will only come here for asynchronous mode.
- WriteResponse();
- } else if (step_ == Steps::COMPLETE) {
- step_ = Steps::FINISH;
- }
-
- return step_ != Steps::FINISH;
- }
以上即为tritonserver grpc异步请求注册的全流程,欢迎各位程序员同学进行指正、讨论。
也非常欢迎同学们关注公众号进行沟通,一起学习,一起进步。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。