当前位置:   article > 正文

tritonserver学习之六:自定义c++、python custom backend实践_triton c++

triton c++

tritonserver学习之一:triton使用流程

tritonserver学习之二:tritonserver编译 

tritonserver学习之三:tritonserver运行流程

tritonserver学习之四:命令行解析

tritonserver学习之五:backend实现机制

tritonserver学习之七:cache管理器

 tritonserver学习之八:redis_caches实践

tritonserver学习之九:tritonserver grpc异步模式

1、环境准备(Ubuntu2004)

1.1 cmake安装

triton backend的编译,cmake的版本要3.17以上,从这里下载当前最新版本cmake,3.28版本:

https://githubfast.com/Kitware/CMake/releases/download/v3.28.1/cmake-3.28.1.tar.gz

进行环境检查:

  1. tar zxvf cmake-3.28.1.tar.gz
  2. cd cmake-3.28.1/
  3. ./bootstrap

你可能会遇到如下情况:

这种情况需要安装openssl:

sudo apt-get install libssl-dev

安装完成后,重新执行bootstrap则运行成功。 

运行make && sudo make install.

1.2 RapidJSON安装

clone代码:

git clone https://github.com/miloyip/rapidjson.git

最近github抽风,如果clone不下来就用如下命令:

git clone https://githubfast.com/miloyip/rapidjson.git
  1. cd rapidjson
  2. mkdir build
  3. cd build
  4. make && make install

2、c++ 自定义backend

2.1 自定义backend编译

c++自定义backend,上一篇文章:tritonserver学习之五:backend实现机制,有介绍,需要实现7个api,我们以backend代码库的recommended.cc为例,在调试过程中,我是复制了,路径如下:backend/examples/backends/liupeng,稍作改动加一些打印帮助理解:

  1. // Copyright 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
  2. //
  3. // Redistribution and use in source and binary forms, with or without
  4. // modification, are permitted provided that the following conditions
  5. // are met:
  6. // * Redistributions of source code must retain the above copyright
  7. // notice, this list of conditions and the following disclaimer.
  8. // * Redistributions in binary form must reproduce the above copyright
  9. // notice, this list of conditions and the following disclaimer in the
  10. // documentation and/or other materials provided with the distribution.
  11. // * Neither the name of NVIDIA CORPORATION nor the names of its
  12. // contributors may be used to endorse or promote products derived
  13. // from this software without specific prior written permission.
  14. //
  15. // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
  16. // EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  17. // IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
  18. // PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
  19. // CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
  20. // EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
  21. // PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
  22. // PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
  23. // OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  24. // (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  25. // OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  26. #include "triton/backend/backend_common.h"
  27. #include "triton/backend/backend_input_collector.h"
  28. #include "triton/backend/backend_model.h"
  29. #include "triton/backend/backend_model_instance.h"
  30. #include "triton/backend/backend_output_responder.h"
  31. #include "triton/core/tritonbackend.h"
  32. namespace triton { namespace backend { namespace recommended {
  33. //
  34. // Backend that demonstrates the TRITONBACKEND API. This backend works
  35. // for any model that has 1 input with any datatype and any shape and
  36. // 1 output with the same shape and datatype as the input. The backend
  37. // supports both batching and non-batching models.
  38. //
  39. // For each batch of requests, the backend returns the input tensor
  40. // value in the output tensor.
  41. //
  42. /
  43. extern "C" {
  44. // Triton calls TRITONBACKEND_Initialize when a backend is loaded into
  45. // Triton to allow the backend to create and initialize any state that
  46. // is intended to be shared across all models and model instances that
  47. // use the backend. The backend should also verify version
  48. // compatibility with Triton in this function.
  49. //
  50. TRITONSERVER_Error*
  51. TRITONBACKEND_Initialize(TRITONBACKEND_Backend* backend)
  52. {
  53. const char* cname;
  54. RETURN_IF_ERROR(TRITONBACKEND_BackendName(backend, &cname));
  55. std::string name(cname);
  56. LOG_MESSAGE(
  57. TRITONSERVER_LOG_INFO,
  58. (std::string("TRITONBACKEND_Initialize: ") + name).c_str());
  59. // Check the backend API version that Triton supports vs. what this
  60. // backend was compiled against. Make sure that the Triton major
  61. // version is the same and the minor version is >= what this backend
  62. // uses.
  63. uint32_t api_version_major, api_version_minor;
  64. RETURN_IF_ERROR(
  65. TRITONBACKEND_ApiVersion(&api_version_major, &api_version_minor));
  66. LOG_MESSAGE(
  67. TRITONSERVER_LOG_INFO,
  68. (std::string("Triton TRITONBACKEND API version: ") +
  69. std::to_string(api_version_major) + "." +
  70. std::to_string(api_version_minor))
  71. .c_str());
  72. LOG_MESSAGE(
  73. TRITONSERVER_LOG_INFO,
  74. (std::string("'") + name + "' TRITONBACKEND API version: " +
  75. std::to_string(TRITONBACKEND_API_VERSION_MAJOR) + "." +
  76. std::to_string(TRITONBACKEND_API_VERSION_MINOR))
  77. .c_str());
  78. if ((api_version_major != TRITONBACKEND_API_VERSION_MAJOR) ||
  79. (api_version_minor < TRITONBACKEND_API_VERSION_MINOR)) {
  80. return TRITONSERVER_ErrorNew(
  81. TRITONSERVER_ERROR_UNSUPPORTED,
  82. "triton backend API version does not support this backend");
  83. }
  84. // The backend configuration may contain information needed by the
  85. // backend, such as tritonserver command-line arguments. This
  86. // backend doesn't use any such configuration but for this example
  87. // print whatever is available.
  88. TRITONSERVER_Message* backend_config_message;
  89. RETURN_IF_ERROR(
  90. TRITONBACKEND_BackendConfig(backend, &backend_config_message));
  91. const char* buffer;
  92. size_t byte_size;
  93. RETURN_IF_ERROR(TRITONSERVER_MessageSerializeToJson(
  94. backend_config_message, &buffer, &byte_size));
  95. LOG_MESSAGE(
  96. TRITONSERVER_LOG_INFO,
  97. (std::string("backend configuration:\n") + buffer).c_str());
  98. // This backend does not require any "global" state but as an
  99. // example create a string to demonstrate.
  100. std::string* state = new std::string("backend state");
  101. RETURN_IF_ERROR(
  102. TRITONBACKEND_BackendSetState(backend, reinterpret_cast<void*>(state)));
  103. return nullptr; // success
  104. }
  105. // Triton calls TRITONBACKEND_Finalize when a backend is no longer
  106. // needed.
  107. //
  108. TRITONSERVER_Error*
  109. TRITONBACKEND_Finalize(TRITONBACKEND_Backend* backend)
  110. {
  111. // Delete the "global" state associated with the backend.
  112. void* vstate;
  113. RETURN_IF_ERROR(TRITONBACKEND_BackendState(backend, &vstate));
  114. std::string* state = reinterpret_cast<std::string*>(vstate);
  115. LOG_MESSAGE(
  116. TRITONSERVER_LOG_INFO,
  117. (std::string("TRITONBACKEND_Finalize: state is '") + *state + "'")
  118. .c_str());
  119. delete state;
  120. return nullptr; // success
  121. }
  122. } // extern "C"
  123. /
  124. //
  125. // ModelState
  126. //
  127. // State associated with a model that is using this backend. An object
  128. // of this class is created and associated with each
  129. // TRITONBACKEND_Model. ModelState is derived from BackendModel class
  130. // provided in the backend utilities that provides many common
  131. // functions.
  132. //
  133. class ModelState : public BackendModel {
  134. public:
  135. static TRITONSERVER_Error* Create(
  136. TRITONBACKEND_Model* triton_model, ModelState** state);
  137. virtual ~ModelState() = default;
  138. // Name of the input and output tensor
  139. const std::string& InputTensorName() const { return input_name_; }
  140. const std::string& OutputTensorName() const { return output_name_; }
  141. // Datatype of the input and output tensor
  142. TRITONSERVER_DataType TensorDataType() const { return datatype_; }
  143. // Shape of the input and output tensor as given in the model
  144. // configuration file. This shape will not include the batch
  145. // dimension (if the model has one).
  146. const std::vector<int64_t>& TensorNonBatchShape() const { return nb_shape_; }
  147. // Shape of the input and output tensor, including the batch
  148. // dimension (if the model has one). This method cannot be called
  149. // until the model is completely loaded and initialized, including
  150. // all instances of the model. In practice, this means that backend
  151. // should only call it in TRITONBACKEND_ModelInstanceExecute.
  152. TRITONSERVER_Error* TensorShape(std::vector<int64_t>& shape);
  153. // Validate that this model is supported by this backend.
  154. TRITONSERVER_Error* ValidateModelConfig();
  155. private:
  156. ModelState(TRITONBACKEND_Model* triton_model);
  157. std::string input_name_;
  158. std::string output_name_;
  159. TRITONSERVER_DataType datatype_;
  160. bool shape_initialized_;
  161. std::vector<int64_t> nb_shape_;
  162. std::vector<int64_t> shape_;
  163. };
  164. ModelState::ModelState(TRITONBACKEND_Model* triton_model)
  165. : BackendModel(triton_model), shape_initialized_(false)
  166. {
  167. // Validate that the model's configuration matches what is supported
  168. // by this backend.
  169. THROW_IF_BACKEND_MODEL_ERROR(ValidateModelConfig());
  170. }
  171. TRITONSERVER_Error*
  172. ModelState::Create(TRITONBACKEND_Model* triton_model, ModelState** state)
  173. {
  174. try {
  175. *state = new ModelState(triton_model);
  176. }
  177. catch (const BackendModelException& ex) {
  178. RETURN_ERROR_IF_TRUE(
  179. ex.err_ == nullptr, TRITONSERVER_ERROR_INTERNAL,
  180. std::string("unexpected nullptr in BackendModelException"));
  181. RETURN_IF_ERROR(ex.err_);
  182. }
  183. return nullptr; // success
  184. }
  185. TRITONSERVER_Error*
  186. ModelState::TensorShape(std::vector<int64_t>& shape)
  187. {
  188. // This backend supports models that batch along the first dimension
  189. // and those that don't batch. For non-batch models the output shape
  190. // will be the shape from the model configuration. For batch models
  191. // the output shape will be the shape from the model configuration
  192. // prepended with [ -1 ] to represent the batch dimension. The
  193. // backend "responder" utility used below will set the appropriate
  194. // batch dimension value for each response. The shape needs to be
  195. // initialized lazily because the SupportsFirstDimBatching function
  196. // cannot be used until the model is completely loaded.
  197. if (!shape_initialized_) {
  198. bool supports_first_dim_batching;
  199. RETURN_IF_ERROR(SupportsFirstDimBatching(&supports_first_dim_batching));
  200. if (supports_first_dim_batching) {
  201. shape_.push_back(-1);
  202. }
  203. shape_.insert(shape_.end(), nb_shape_.begin(), nb_shape_.end());
  204. shape_initialized_ = true;
  205. }
  206. shape = shape_;
  207. return nullptr; // success
  208. }
  209. TRITONSERVER_Error*
  210. ModelState::ValidateModelConfig()
  211. {
  212. // If verbose logging is enabled, dump the model's configuration as
  213. // JSON into the console output.
  214. if (TRITONSERVER_LogIsEnabled(TRITONSERVER_LOG_VERBOSE)) {
  215. common::TritonJson::WriteBuffer buffer;
  216. RETURN_IF_ERROR(ModelConfig().PrettyWrite(&buffer));
  217. LOG_MESSAGE(
  218. TRITONSERVER_LOG_VERBOSE,
  219. (std::string("model configuration:\n") + buffer.Contents()).c_str());
  220. }
  221. // ModelConfig is the model configuration as a TritonJson
  222. // object. Use the TritonJson utilities to parse the JSON and
  223. // determine if the configuration is supported by this backend.
  224. common::TritonJson::Value inputs, outputs;
  225. RETURN_IF_ERROR(ModelConfig().MemberAsArray("input", &inputs));
  226. RETURN_IF_ERROR(ModelConfig().MemberAsArray("output", &outputs));
  227. // The model must have exactly 1 input and 1 output.
  228. RETURN_ERROR_IF_FALSE(
  229. inputs.ArraySize() == 1, TRITONSERVER_ERROR_INVALID_ARG,
  230. std::string("model configuration must have 1 input"));
  231. RETURN_ERROR_IF_FALSE(
  232. outputs.ArraySize() == 1, TRITONSERVER_ERROR_INVALID_ARG,
  233. std::string("model configuration must have 1 output"));
  234. common::TritonJson::Value input, output;
  235. RETURN_IF_ERROR(inputs.IndexAsObject(0, &input));
  236. RETURN_IF_ERROR(outputs.IndexAsObject(0, &output));
  237. // Record the input and output name in the model state.
  238. const char* input_name;
  239. size_t input_name_len;
  240. RETURN_IF_ERROR(input.MemberAsString("name", &input_name, &input_name_len));
  241. input_name_ = std::string(input_name);
  242. const char* output_name;
  243. size_t output_name_len;
  244. RETURN_IF_ERROR(
  245. output.MemberAsString("name", &output_name, &output_name_len));
  246. output_name_ = std::string(output_name);
  247. // Input and output must have same datatype
  248. std::string input_dtype, output_dtype;
  249. RETURN_IF_ERROR(input.MemberAsString("data_type", &input_dtype));
  250. RETURN_IF_ERROR(output.MemberAsString("data_type", &output_dtype));
  251. RETURN_ERROR_IF_FALSE(
  252. input_dtype == output_dtype, TRITONSERVER_ERROR_INVALID_ARG,
  253. std::string("expected input and output datatype to match, got ") +
  254. input_dtype + " and " + output_dtype);
  255. datatype_ = ModelConfigDataTypeToTritonServerDataType(input_dtype);
  256. // Input and output must have same shape. Reshape is not supported
  257. // on either input or output so flag an error is the model
  258. // configuration uses it.
  259. triton::common::TritonJson::Value reshape;
  260. RETURN_ERROR_IF_TRUE(
  261. input.Find("reshape", &reshape), TRITONSERVER_ERROR_UNSUPPORTED,
  262. std::string("reshape not supported for input tensor"));
  263. RETURN_ERROR_IF_TRUE(
  264. output.Find("reshape", &reshape), TRITONSERVER_ERROR_UNSUPPORTED,
  265. std::string("reshape not supported for output tensor"));
  266. std::vector<int64_t> input_shape, output_shape;
  267. RETURN_IF_ERROR(backend::ParseShape(input, "dims", &input_shape));
  268. RETURN_IF_ERROR(backend::ParseShape(output, "dims", &output_shape));
  269. RETURN_ERROR_IF_FALSE(
  270. input_shape == output_shape, TRITONSERVER_ERROR_INVALID_ARG,
  271. std::string("expected input and output shape to match, got ") +
  272. backend::ShapeToString(input_shape) + " and " +
  273. backend::ShapeToString(output_shape));
  274. nb_shape_ = input_shape;
  275. return nullptr; // success
  276. }
  277. extern "C" {
  278. // Triton calls TRITONBACKEND_ModelInitialize when a model is loaded
  279. // to allow the backend to create any state associated with the model,
  280. // and to also examine the model configuration to determine if the
  281. // configuration is suitable for the backend. Any errors reported by
  282. // this function will prevent the model from loading.
  283. //
  284. TRITONSERVER_Error*
  285. TRITONBACKEND_ModelInitialize(TRITONBACKEND_Model* model)
  286. {
  287. // Create a ModelState object and associate it with the
  288. // TRITONBACKEND_Model. If anything goes wrong with initialization
  289. // of the model state then an error is returned and Triton will fail
  290. // to load the model.
  291. ModelState* model_state;
  292. RETURN_IF_ERROR(ModelState::Create(model, &model_state));
  293. RETURN_IF_ERROR(
  294. TRITONBACKEND_ModelSetState(model, reinterpret_cast<void*>(model_state)));
  295. LOG_MESSAGE(
  296. TRITONSERVER_LOG_INFO,
  297. "============TRITONBACKEND_ModelInitialize============");
  298. return nullptr; // success
  299. }
  300. // Triton calls TRITONBACKEND_ModelFinalize when a model is no longer
  301. // needed. The backend should cleanup any state associated with the
  302. // model. This function will not be called until all model instances
  303. // of the model have been finalized.
  304. //
  305. TRITONSERVER_Error*
  306. TRITONBACKEND_ModelFinalize(TRITONBACKEND_Model* model)
  307. {
  308. void* vstate;
  309. RETURN_IF_ERROR(TRITONBACKEND_ModelState(model, &vstate));
  310. ModelState* model_state = reinterpret_cast<ModelState*>(vstate);
  311. delete model_state;
  312. LOG_MESSAGE(
  313. TRITONSERVER_LOG_INFO,
  314. "============TRITONBACKEND_ModelFinalize============");
  315. return nullptr; // success
  316. }
  317. } // extern "C"
  318. /
  319. //
  320. // ModelInstanceState
  321. //
  322. // State associated with a model instance. An object of this class is
  323. // created and associated with each
  324. // TRITONBACKEND_ModelInstance. ModelInstanceState is derived from
  325. // BackendModelInstance class provided in the backend utilities that
  326. // provides many common functions.
  327. //
  328. class ModelInstanceState : public BackendModelInstance {
  329. public:
  330. static TRITONSERVER_Error* Create(
  331. ModelState* model_state,
  332. TRITONBACKEND_ModelInstance* triton_model_instance,
  333. ModelInstanceState** state);
  334. virtual ~ModelInstanceState() = default;
  335. // Get the state of the model that corresponds to this instance.
  336. ModelState* StateForModel() const { return model_state_; }
  337. private:
  338. ModelInstanceState(
  339. ModelState* model_state,
  340. TRITONBACKEND_ModelInstance* triton_model_instance)
  341. : BackendModelInstance(model_state, triton_model_instance),
  342. model_state_(model_state)
  343. {
  344. }
  345. ModelState* model_state_;
  346. };
  347. TRITONSERVER_Error*
  348. ModelInstanceState::Create(
  349. ModelState* model_state, TRITONBACKEND_ModelInstance* triton_model_instance,
  350. ModelInstanceState** state)
  351. {
  352. try {
  353. *state = new ModelInstanceState(model_state, triton_model_instance);
  354. }
  355. catch (const BackendModelInstanceException& ex) {
  356. RETURN_ERROR_IF_TRUE(
  357. ex.err_ == nullptr, TRITONSERVER_ERROR_INTERNAL,
  358. std::string("unexpected nullptr in BackendModelInstanceException"));
  359. RETURN_IF_ERROR(ex.err_);
  360. }
  361. return nullptr; // success
  362. }
  363. extern "C" {
  364. // Triton calls TRITONBACKEND_ModelInstanceInitialize when a model
  365. // instance is created to allow the backend to initialize any state
  366. // associated with the instance.
  367. //
  368. TRITONSERVER_Error*
  369. TRITONBACKEND_ModelInstanceInitialize(TRITONBACKEND_ModelInstance* instance)
  370. {
  371. // Get the model state associated with this instance's model.
  372. TRITONBACKEND_Model* model;
  373. RETURN_IF_ERROR(TRITONBACKEND_ModelInstanceModel(instance, &model));
  374. void* vmodelstate;
  375. RETURN_IF_ERROR(TRITONBACKEND_ModelState(model, &vmodelstate));
  376. ModelState* model_state = reinterpret_cast<ModelState*>(vmodelstate);
  377. // Create a ModelInstanceState object and associate it with the
  378. // TRITONBACKEND_ModelInstance.
  379. ModelInstanceState* instance_state;
  380. RETURN_IF_ERROR(
  381. ModelInstanceState::Create(model_state, instance, &instance_state));
  382. RETURN_IF_ERROR(TRITONBACKEND_ModelInstanceSetState(
  383. instance, reinterpret_cast<void*>(instance_state)));
  384. LOG_MESSAGE(
  385. TRITONSERVER_LOG_INFO,
  386. "============TRITONBACKEND_ModelInstanceInitialize============");
  387. return nullptr; // success
  388. }
  389. // Triton calls TRITONBACKEND_ModelInstanceFinalize when a model
  390. // instance is no longer needed. The backend should cleanup any state
  391. // associated with the model instance.
  392. //
  393. TRITONSERVER_Error*
  394. TRITONBACKEND_ModelInstanceFinalize(TRITONBACKEND_ModelInstance* instance)
  395. {
  396. void* vstate;
  397. RETURN_IF_ERROR(TRITONBACKEND_ModelInstanceState(instance, &vstate));
  398. ModelInstanceState* instance_state =
  399. reinterpret_cast<ModelInstanceState*>(vstate);
  400. delete instance_state;
  401. LOG_MESSAGE(
  402. TRITONSERVER_LOG_INFO,
  403. "============TRITONBACKEND_ModelInstanceFinalize============");
  404. return nullptr; // success
  405. }
  406. } // extern "C"
  407. /
  408. extern "C" {
  409. // When Triton calls TRITONBACKEND_ModelInstanceExecute it is required
  410. // that a backend create a response for each request in the batch. A
  411. // response may be the output tensors required for that request or may
  412. // be an error that is returned in the response.
  413. //
  414. TRITONSERVER_Error*
  415. TRITONBACKEND_ModelInstanceExecute(
  416. TRITONBACKEND_ModelInstance* instance, TRITONBACKEND_Request** requests,
  417. const uint32_t request_count)
  418. {
  419. // Collect various timestamps during the execution of this batch or
  420. // requests. These values are reported below before returning from
  421. // the function.
  422. LOG_MESSAGE(
  423. TRITONSERVER_LOG_INFO,
  424. "============TRITONBACKEND_ModelInstanceExecute============");
  425. uint64_t exec_start_ns = 0;
  426. SET_TIMESTAMP(exec_start_ns);
  427. // Triton will not call this function simultaneously for the same
  428. // 'instance'. But since this backend could be used by multiple
  429. // instances from multiple models the implementation needs to handle
  430. // multiple calls to this function at the same time (with different
  431. // 'instance' objects). Best practice for a high-performance
  432. // implementation is to avoid introducing mutex/lock and instead use
  433. // only function-local and model-instance-specific state.
  434. ModelInstanceState* instance_state;
  435. RETURN_IF_ERROR(TRITONBACKEND_ModelInstanceState(
  436. instance, reinterpret_cast<void**>(&instance_state)));
  437. ModelState* model_state = instance_state->StateForModel();
  438. // 'responses' is initialized as a parallel array to 'requests',
  439. // with one TRITONBACKEND_Response object for each
  440. // TRITONBACKEND_Request object. If something goes wrong while
  441. // creating these response objects, the backend simply returns an
  442. // error from TRITONBACKEND_ModelInstanceExecute, indicating to
  443. // Triton that this backend did not create or send any responses and
  444. // so it is up to Triton to create and send an appropriate error
  445. // response for each request. RETURN_IF_ERROR is one of several
  446. // useful macros for error handling that can be found in
  447. // backend_common.h.
  448. std::vector<TRITONBACKEND_Response*> responses;
  449. responses.reserve(request_count);
  450. for (uint32_t r = 0; r < request_count; ++r) {
  451. TRITONBACKEND_Request* request = requests[r];
  452. TRITONBACKEND_Response* response;
  453. RETURN_IF_ERROR(TRITONBACKEND_ResponseNew(&response, request));
  454. responses.push_back(response);
  455. }
  456. // At this point, the backend takes ownership of 'requests', which
  457. // means that it is responsible for sending a response for every
  458. // request. From here, even if something goes wrong in processing,
  459. // the backend must return 'nullptr' from this function to indicate
  460. // success. Any errors and failures must be communicated via the
  461. // response objects.
  462. //
  463. // To simplify error handling, the backend utilities manage
  464. // 'responses' in a specific way and it is recommended that backends
  465. // follow this same pattern. When an error is detected in the
  466. // processing of a request, an appropriate error response is sent
  467. // and the corresponding TRITONBACKEND_Response object within
  468. // 'responses' is set to nullptr to indicate that the
  469. // request/response has already been handled and no further processing
  470. // should be performed for that request. Even if all responses fail,
  471. // the backend still allows execution to flow to the end of the
  472. // function so that statistics are correctly reported by the calls
  473. // to TRITONBACKEND_ModelInstanceReportStatistics and
  474. // TRITONBACKEND_ModelInstanceReportBatchStatistics.
  475. // RESPOND_AND_SET_NULL_IF_ERROR, and
  476. // RESPOND_ALL_AND_SET_NULL_IF_ERROR are macros from
  477. // backend_common.h that assist in this management of response
  478. // objects.
  479. // The backend could iterate over the 'requests' and process each
  480. // one separately. But for performance reasons it is usually
  481. // preferred to create batched input tensors that are processed
  482. // simultaneously. This is especially true for devices like GPUs
  483. // that are capable of exploiting the large amount parallelism
  484. // exposed by larger data sets.
  485. //
  486. // The backend utilities provide a "collector" to facilitate this
  487. // batching process. The 'collector's ProcessTensor function will
  488. // combine a tensor's value from each request in the batch into a
  489. // single contiguous buffer. The buffer can be provided by the
  490. // backend or 'collector' can create and manage it. In this backend,
  491. // there is not a specific buffer into which the batch should be
  492. // created, so use ProcessTensor arguments that cause collector to
  493. // manage it. ProcessTensor does NOT support TRITONSERVER_TYPE_BYTES
  494. // data type.
  495. BackendInputCollector collector(
  496. requests, request_count, &responses, model_state->TritonMemoryManager(),
  497. false /* pinned_enabled */, nullptr /* stream*/);
  498. // To instruct ProcessTensor to "gather" the entire batch of input
  499. // tensors into a single contiguous buffer in CPU memory, set the
  500. // "allowed input types" to be the CPU ones (see tritonserver.h in
  501. // the triton-inference-server/core repo for allowed memory types).
  502. std::vector<std::pair<TRITONSERVER_MemoryType, int64_t>> allowed_input_types =
  503. {{TRITONSERVER_MEMORY_CPU_PINNED, 0}, {TRITONSERVER_MEMORY_CPU, 0}};
  504. const char* input_buffer;
  505. size_t input_buffer_byte_size;
  506. TRITONSERVER_MemoryType input_buffer_memory_type;
  507. int64_t input_buffer_memory_type_id;
  508. RESPOND_ALL_AND_SET_NULL_IF_ERROR(
  509. responses, request_count,
  510. collector.ProcessTensor(
  511. model_state->InputTensorName().c_str(), nullptr /* existing_buffer */,
  512. 0 /* existing_buffer_byte_size */, allowed_input_types, &input_buffer,
  513. &input_buffer_byte_size, &input_buffer_memory_type,
  514. &input_buffer_memory_type_id));
  515. // Finalize the collector. If 'true' is returned, 'input_buffer'
  516. // will not be valid until the backend synchronizes the CUDA
  517. // stream or event that was used when creating the collector. For
  518. // this backend, GPU is not supported and so no CUDA sync should
  519. // be needed; so if 'true' is returned simply log an error.
  520. const bool need_cuda_input_sync = collector.Finalize();
  521. if (need_cuda_input_sync) {
  522. LOG_MESSAGE(
  523. TRITONSERVER_LOG_ERROR,
  524. "'recommended' backend: unexpected CUDA sync required by collector");
  525. }
  526. // 'input_buffer' contains the batched input tensor. The backend can
  527. // implement whatever logic is necessary to produce the output
  528. // tensor. This backend simply logs the input tensor value and then
  529. // returns the input tensor value in the output tensor so no actual
  530. // computation is needed.
  531. uint64_t compute_start_ns = 0;
  532. SET_TIMESTAMP(compute_start_ns);
  533. LOG_MESSAGE(
  534. TRITONSERVER_LOG_INFO,
  535. (std::string("model ") + model_state->Name() + ": requests in batch " +
  536. std::to_string(request_count))
  537. .c_str());
  538. std::string tstr;
  539. IGNORE_ERROR(BufferAsTypedString(
  540. tstr, input_buffer, input_buffer_byte_size,
  541. model_state->TensorDataType()));
  542. LOG_MESSAGE(
  543. TRITONSERVER_LOG_INFO,
  544. (std::string("batched " + model_state->InputTensorName() + " value: ") +
  545. tstr)
  546. .c_str());
  547. const char* output_buffer = input_buffer;
  548. TRITONSERVER_MemoryType output_buffer_memory_type = input_buffer_memory_type;
  549. int64_t output_buffer_memory_type_id = input_buffer_memory_type_id;
  550. uint64_t compute_end_ns = 0;
  551. SET_TIMESTAMP(compute_end_ns);
  552. bool supports_first_dim_batching;
  553. RESPOND_ALL_AND_SET_NULL_IF_ERROR(
  554. responses, request_count,
  555. model_state->SupportsFirstDimBatching(&supports_first_dim_batching));
  556. std::vector<int64_t> tensor_shape;
  557. RESPOND_ALL_AND_SET_NULL_IF_ERROR(
  558. responses, request_count, model_state->TensorShape(tensor_shape));
  559. // Because the output tensor values are concatenated into a single
  560. // contiguous 'output_buffer', the backend must "scatter" them out
  561. // to the individual response output tensors. The backend utilities
  562. // provide a "responder" to facilitate this scattering process.
  563. // BackendOutputResponder does NOT support TRITONSERVER_TYPE_BYTES
  564. // data type.
  565. // The 'responders's ProcessTensor function will copy the portion of
  566. // 'output_buffer' corresponding to each request's output into the
  567. // response for that request.
  568. BackendOutputResponder responder(
  569. requests, request_count, &responses, model_state->TritonMemoryManager(),
  570. supports_first_dim_batching, false /* pinned_enabled */,
  571. nullptr /* stream*/);
  572. responder.ProcessTensor(
  573. model_state->OutputTensorName().c_str(), model_state->TensorDataType(),
  574. tensor_shape, output_buffer, output_buffer_memory_type,
  575. output_buffer_memory_type_id);
  576. // Finalize the responder. If 'true' is returned, the output
  577. // tensors' data will not be valid until the backend synchronizes
  578. // the CUDA stream or event that was used when creating the
  579. // responder. For this backend, GPU is not supported and so no CUDA
  580. // sync should be needed; so if 'true' is returned simply log an
  581. // error.
  582. const bool need_cuda_output_sync = responder.Finalize();
  583. if (need_cuda_output_sync) {
  584. LOG_MESSAGE(
  585. TRITONSERVER_LOG_ERROR,
  586. "'recommended' backend: unexpected CUDA sync required by responder");
  587. }
  588. // Send all the responses that haven't already been sent because of
  589. // an earlier error.
  590. for (auto& response : responses) {
  591. if (response != nullptr) {
  592. LOG_IF_ERROR(
  593. TRITONBACKEND_ResponseSend(
  594. response, TRITONSERVER_RESPONSE_COMPLETE_FINAL, nullptr),
  595. "failed to send response");
  596. }
  597. }
  598. uint64_t exec_end_ns = 0;
  599. SET_TIMESTAMP(exec_end_ns);
  600. #ifdef TRITON_ENABLE_STATS
  601. // For batch statistics need to know the total batch size of the
  602. // requests. This is not necessarily just the number of requests,
  603. // because if the model supports batching then any request can be a
  604. // batched request itself.
  605. size_t total_batch_size = 0;
  606. if (!supports_first_dim_batching) {
  607. total_batch_size = request_count;
  608. } else {
  609. for (uint32_t r = 0; r < request_count; ++r) {
  610. auto& request = requests[r];
  611. TRITONBACKEND_Input* input = nullptr;
  612. LOG_IF_ERROR(
  613. TRITONBACKEND_RequestInputByIndex(request, 0 /* index */, &input),
  614. "failed getting request input");
  615. if (input != nullptr) {
  616. const int64_t* shape = nullptr;
  617. LOG_IF_ERROR(
  618. TRITONBACKEND_InputProperties(
  619. input, nullptr, nullptr, &shape, nullptr, nullptr, nullptr),
  620. "failed getting input properties");
  621. if (shape != nullptr) {
  622. total_batch_size += shape[0];
  623. }
  624. }
  625. }
  626. }
  627. #else
  628. (void)exec_start_ns;
  629. (void)exec_end_ns;
  630. (void)compute_start_ns;
  631. (void)compute_end_ns;
  632. #endif // TRITON_ENABLE_STATS
  633. // Report statistics for each request, and then release the request.
  634. for (uint32_t r = 0; r < request_count; ++r) {
  635. auto& request = requests[r];
  636. #ifdef TRITON_ENABLE_STATS
  637. LOG_IF_ERROR(
  638. TRITONBACKEND_ModelInstanceReportStatistics(
  639. instance_state->TritonModelInstance(), request,
  640. (responses[r] != nullptr) /* success */, exec_start_ns,
  641. compute_start_ns, compute_end_ns, exec_end_ns),
  642. "failed reporting request statistics");
  643. #endif // TRITON_ENABLE_STATS
  644. LOG_IF_ERROR(
  645. TRITONBACKEND_RequestRelease(request, TRITONSERVER_REQUEST_RELEASE_ALL),
  646. "failed releasing request");
  647. }
  648. #ifdef TRITON_ENABLE_STATS
  649. // Report batch statistics.
  650. LOG_IF_ERROR(
  651. TRITONBACKEND_ModelInstanceReportBatchStatistics(
  652. instance_state->TritonModelInstance(), total_batch_size,
  653. exec_start_ns, compute_start_ns, compute_end_ns, exec_end_ns),
  654. "failed reporting batch request statistics");
  655. #endif // TRITON_ENABLE_STATS
  656. return nullptr; // success
  657. }
  658. } // extern "C"
  659. }}} // namespace triton::backend::recommended

之后执行编译:

  1. mkdir build
  2. cmake -DCMAKE_INSTALL_PREFIX:PATH=`pwd`/install ..
  3. make install

编译完成后,会生成动态库:

2.2 自定义backend的serve

模型的serve除了需要模型本身之外,还需要相应的配置文件,例如:

  1. backend: "liupeng"
  2. max_batch_size: 8
  3. dynamic_batching {
  4. max_queue_delay_microseconds: 5000000
  5. }
  6. input [
  7. {
  8. name: "IN0"
  9. data_type: TYPE_INT32
  10. dims: [ 4 ]
  11. }
  12. ]
  13. output [
  14. {
  15. name: "OUT0"
  16. data_type: TYPE_INT32
  17. dims: [ 4 ]
  18. }
  19. ]
  20. instance_group [
  21. {
  22. kind: KIND_CPU
  23. }
  24. ]

 注意,backend的名称是有要求的,其名称必须为:libtriton_backendname.so红色部分。

1、配置文件准备完成后,之后按照如下目录形式放入到model_repository路径下:

2、启动triton镜像,并将 model_repository路径映射到容器中,命令行:

docker run --rm -p8000:8000 -p8001:8001 -p8002:8002 -it -v /root/tritonserver/server/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:23.12-py3

3、启动triton:

tritonserver --model-repository=/models

在triton启动时,会执行 TRITONBACKEND_Initialize函数,有如下打印:

3、python自定义backend

3.1 python自定义backend代码

python自定义backend相比c++简单很多,只要实现三个api即可:

  1. def initialize(self, args)
  2. def execute(self, requests)
  3. def finalize(self)

示例代码位于:

https://gitee.com/bd-super-sugar/python_backend/blob/main/examples/add_sub/model.py

代码如下:

  1. # Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
  2. #
  3. # Redistribution and use in source and binary forms, with or without
  4. # modification, are permitted provided that the following conditions
  5. # are met:
  6. # * Redistributions of source code must retain the above copyright
  7. # notice, this list of conditions and the following disclaimer.
  8. # * Redistributions in binary form must reproduce the above copyright
  9. # notice, this list of conditions and the following disclaimer in the
  10. # documentation and/or other materials provided with the distribution.
  11. # * Neither the name of NVIDIA CORPORATION nor the names of its
  12. # contributors may be used to endorse or promote products derived
  13. # from this software without specific prior written permission.
  14. #
  15. # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
  16. # EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  17. # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
  18. # PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
  19. # CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
  20. # EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
  21. # PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
  22. # PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
  23. # OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  24. # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  25. # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  26. import json
  27. # triton_python_backend_utils is available in every Triton Python model. You
  28. # need to use this module to create inference requests and responses. It also
  29. # contains some utility functions for extracting information from model_config
  30. # and converting Triton input/output types to numpy types.
  31. import triton_python_backend_utils as pb_utils
  32. class TritonPythonModel:
  33. """Your Python model must use the same class name. Every Python model
  34. that is created must have "TritonPythonModel" as the class name.
  35. """
  36. def initialize(self, args):
  37. """`initialize` is called only once when the model is being loaded.
  38. Implementing `initialize` function is optional. This function allows
  39. the model to initialize any state associated with this model.
  40. Parameters
  41. ----------
  42. args : dict
  43. Both keys and values are strings. The dictionary keys and values are:
  44. * model_config: A JSON string containing the model configuration
  45. * model_instance_kind: A string containing model instance kind
  46. * model_instance_device_id: A string containing model instance device ID
  47. * model_repository: Model repository path
  48. * model_version: Model version
  49. * model_name: Model name
  50. """
  51. # You must parse model_config. JSON string is not parsed here
  52. self.model_config = model_config = json.loads(args["model_config"])
  53. # Get OUTPUT0 configuration
  54. output0_config = pb_utils.get_output_config_by_name(model_config, "OUTPUT0")
  55. # Get OUTPUT1 configuration
  56. output1_config = pb_utils.get_output_config_by_name(model_config, "OUTPUT1")
  57. # Convert Triton types to numpy types
  58. self.output0_dtype = pb_utils.triton_string_to_numpy(
  59. output0_config["data_type"]
  60. )
  61. self.output1_dtype = pb_utils.triton_string_to_numpy(
  62. output1_config["data_type"]
  63. )
  64. def execute(self, requests):
  65. """`execute` MUST be implemented in every Python model. `execute`
  66. function receives a list of pb_utils.InferenceRequest as the only
  67. argument. This function is called when an inference request is made
  68. for this model. Depending on the batching configuration (e.g. Dynamic
  69. Batching) used, `requests` may contain multiple requests. Every
  70. Python model, must create one pb_utils.InferenceResponse for every
  71. pb_utils.InferenceRequest in `requests`. If there is an error, you can
  72. set the error argument when creating a pb_utils.InferenceResponse
  73. Parameters
  74. ----------
  75. requests : list
  76. A list of pb_utils.InferenceRequest
  77. Returns
  78. -------
  79. list
  80. A list of pb_utils.InferenceResponse. The length of this list must
  81. be the same as `requests`
  82. """
  83. output0_dtype = self.output0_dtype
  84. output1_dtype = self.output1_dtype
  85. responses = []
  86. # Every Python backend must iterate over everyone of the requests
  87. # and create a pb_utils.InferenceResponse for each of them.
  88. for request in requests:
  89. # Get INPUT0
  90. in_0 = pb_utils.get_input_tensor_by_name(request, "INPUT0")
  91. # Get INPUT1
  92. in_1 = pb_utils.get_input_tensor_by_name(request, "INPUT1")
  93. out_0, out_1 = (
  94. in_0.as_numpy() + in_1.as_numpy(),
  95. in_0.as_numpy() - in_1.as_numpy(),
  96. )
  97. # Create output tensors. You need pb_utils.Tensor
  98. # objects to create pb_utils.InferenceResponse.
  99. out_tensor_0 = pb_utils.Tensor("OUTPUT0", out_0.astype(output0_dtype))
  100. out_tensor_1 = pb_utils.Tensor("OUTPUT1", out_1.astype(output1_dtype))
  101. # Create InferenceResponse. You can set an error here in case
  102. # there was a problem with handling this inference request.
  103. # Below is an example of how you can set errors in inference
  104. # response:
  105. #
  106. # pb_utils.InferenceResponse(
  107. # output_tensors=..., TritonError("An error occurred"))
  108. inference_response = pb_utils.InferenceResponse(
  109. output_tensors=[out_tensor_0, out_tensor_1]
  110. )
  111. responses.append(inference_response)
  112. # You should return a list of pb_utils.InferenceResponse. Length
  113. # of this list must match the length of `requests` list.
  114. return responses
  115. def finalize(self):
  116. """`finalize` is called only once when the model is being unloaded.
  117. Implementing `finalize` function is OPTIONAL. This function allows
  118. the model to perform any necessary clean ups before exit.
  119. """
  120. print("Cleaning up...")

 对应的配置config.pbtxt如下:

  1. name: "liupeng_python"
  2. backend: "python"
  3. input [
  4. {
  5. name: "INPUT0"
  6. data_type: TYPE_FP32
  7. dims: [ 4 ]
  8. }
  9. ]
  10. input [
  11. {
  12. name: "INPUT1"
  13. data_type: TYPE_FP32
  14. dims: [ 4 ]
  15. }
  16. ]
  17. output [
  18. {
  19. name: "OUTPUT0"
  20. data_type: TYPE_FP32
  21. dims: [ 4 ]
  22. }
  23. ]
  24. output [
  25. {
  26. name: "OUTPUT1"
  27. data_type: TYPE_FP32
  28. dims: [ 4 ]
  29. }
  30. ]
  31. instance_group [{ kind: KIND_CPU }]

3.2 模型的serve

模型目录组织结构如下:

 启动后,模型加载情况:

到此,python自定义backend已经加载成功。 

4、执行推理

以python自定义backend为例,client端代码:

  1. # Copyright 2020-2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
  2. #
  3. # Redistribution and use in source and binary forms, with or without
  4. # modification, are permitted provided that the following conditions
  5. # are met:
  6. # * Redistributions of source code must retain the above copyright
  7. # notice, this list of conditions and the following disclaimer.
  8. # * Redistributions in binary form must reproduce the above copyright
  9. # notice, this list of conditions and the following disclaimer in the
  10. # documentation and/or other materials provided with the distribution.
  11. # * Neither the name of NVIDIA CORPORATION nor the names of its
  12. # contributors may be used to endorse or promote products derived
  13. # from this software without specific prior written permission.
  14. #
  15. # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
  16. # EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  17. # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
  18. # PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
  19. # CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
  20. # EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
  21. # PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
  22. # PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
  23. # OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  24. # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  25. # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  26. import sys
  27. import numpy as np
  28. import tritonclient.http as httpclient
  29. from tritonclient.utils import *
  30. model_name = "liupeng_python"
  31. shape = [4]
  32. with httpclient.InferenceServerClient("localhost:8000") as client:
  33. input0_data = np.random.rand(*shape).astype(np.float32)
  34. input1_data = np.random.rand(*shape).astype(np.float32)
  35. inputs = [
  36. httpclient.InferInput(
  37. "INPUT0", input0_data.shape, np_to_triton_dtype(input0_data.dtype)
  38. ),
  39. httpclient.InferInput(
  40. "INPUT1", input1_data.shape, np_to_triton_dtype(input1_data.dtype)
  41. ),
  42. ]
  43. inputs[0].set_data_from_numpy(input0_data)
  44. inputs[1].set_data_from_numpy(input1_data)
  45. outputs = [
  46. httpclient.InferRequestedOutput("OUTPUT0"),
  47. httpclient.InferRequestedOutput("OUTPUT1"),
  48. ]
  49. response = client.infer(model_name, inputs, request_id=str(1), outputs=outputs)
  50. result = response.get_response()
  51. output0_data = response.as_numpy("OUTPUT0")
  52. output1_data = response.as_numpy("OUTPUT1")
  53. print(
  54. "INPUT0 ({}) + INPUT1 ({}) = OUTPUT0 ({})".format(
  55. input0_data, input1_data, output0_data
  56. )
  57. )
  58. print(
  59. "INPUT0 ({}) - INPUT1 ({}) = OUTPUT1 ({})".format(
  60. input0_data, input1_data, output1_data
  61. )
  62. )
  63. if not np.allclose(input0_data + input1_data, output0_data):
  64. print("add_sub example error: incorrect sum")
  65. sys.exit(1)
  66. if not np.allclose(input0_data - input1_data, output1_data):
  67. print("add_sub example error: incorrect difference")
  68. sys.exit(1)
  69. print("PASS: liupeng_python")
  70. sys.exit(0)

安装好triton client依赖库后,执行该脚本,即可请求到【liupeng_python】模型上,简单期间,可以在triton的client镜像中运行。

5、欢迎关注

欢迎关注本人公众号:

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/从前慢现在也慢/article/detail/351805
推荐阅读
相关标签
  

闽ICP备14008679号