赞
踩
本文分享自华为云社区《Ascend C 自定义PRelu算子》,作者: jackwangcumt。
PReLU是 Parametric Rectified Linear Unit的缩写,首次由何凯明团队提出,和LeakyReLU非常类似,是Relu的改进版本,在几乎没有增加额外参数的前提下既可以提升模型的拟合能力,又能减小过拟合风险。PReLU的数学表达式我们可以参考pytorch中PReLU的描述(PReLU — PyTorch 2.1 documentation):
基于Ascend C进行自定义算子开发之前,需要成功基于昇腾设备安装相关的驱动、固件以及开发者套件。我之前安装的开发者套件版本过低,编译运行官方的Sample部分示例会报错,因此,需要重新安装一个8.0新版本,依次用root执行如下命令:
- # 卸载 cann-toolkit_7.0.RC1
- root@atlas500ai:/home/kzroot/mysoft# ./Ascend-cann-toolkit_7.0.RC1_linux-aarch64.run --uninstall
- # 清空遗留文件
- rm -rf /usr/local/Ascend/ascend-toolkit/*
- # 安装 cann-toolkit_8.0.RC1.alpha002
- ./Ascend-cann-toolkit_8.0.RC1.alpha002_linux-aarch64.run --install --install-for-all --quiet
- #安装依赖protobuf
- pip3 install protobuf==3.20.0
在一个目录下新建单算子工程描述文件 PReluCustom.json ,内容参考如下:
- [
- {
- "op": "PReluCustom",
- "language": "cpp",
- "input_desc": [
- {
- "name": "x",
- "param_type": "required",
- "format": [
- "ND"
- ],
- "type": [
- "float"
- ]
- }
- ],
- "output_desc": [
- {
- "name": "y",
- "param_type": "required",
- "format": [
- "ND"
- ],
- "type": [
- "float"
- ]
- }
- ],
- "attr": [
- {
- "name": "alpha",
- "param_type": "optional",
- "type": "float",
- "default_value": "0.002"
- }
- ]
- }
- ]
用开发者套件中内置的算子工程生成工具msopgen ,通过描述文件自动生成单算子工程代码目录:
- /usr/local/Ascend/ascend-toolkit/8.0.RC1.alpha002/python/site-packages/bin/msopgen gen -i ./PReluCustom.json
- -c ai_core-Ascend310P3 -lan cpp -out ./PReluCustom
执行成功后,会基于C++语言生成单算子工程代码目录PReluCustom,其中包含的CMakePresets.json文件,有几个重要的配置项,特别是开发者套件安装的路径ASCEND_CANN_PACKAGE_PATH,需要根据本地情况进行修改,我这里是 /usr/local/Ascend/ascend-toolkit/latest 否则会出现编译错误,我这里修改的部分代码如下:
- {
- "version": 1,
- "cmakeMinimumRequired": {
- "major": 3,
- "minor": 19,
- "patch": 0
- },
- "configurePresets": [
- {
- "name": "default",
- "displayName": "Default Config",
- "description": "Default build using Unix Makefiles generator",
- "generator": "Unix Makefiles",
- "binaryDir": "${sourceDir}/build_out",
- "cacheVariables": {
- "CMAKE_BUILD_TYPE": {
- "type": "STRING",
- "value": "Release"
- },
- "ENABLE_SOURCE_PACKAGE": {
- "type": "BOOL",
- "value": "True"
- },
- "ENABLE_BINARY_PACKAGE": {
- "type": "BOOL",
- "value": "True"
- },
- "ASCEND_COMPUTE_UNIT": {
- "type": "STRING",
- "value": "ascend310p"
- },
- "ENABLE_TEST": {
- "type": "BOOL",
- "value": "True"
- },
- "vendor_name": {
- "type": "STRING",
- "value": "customize"
- },
- "ASCEND_CANN_PACKAGE_PATH": {
- "type": "PATH",
- "value": "/usr/local/Ascend/ascend-toolkit/latest"
- },
- "ASCEND_PYTHON_EXECUTABLE": {
- "type": "STRING",
- "value": "python3"
- },
- "CMAKE_INSTALL_PREFIX": {
- "type": "PATH",
- "value": "${sourceDir}/build_out"
- },
- "ENABLE_CROSS_COMPILE": {
- "type": "BOOL",
- "value": "False"
- },
- "CMAKE_CROSS_PLATFORM_COMPILER": {
- "type": "PATH",
- "value": "/usr/bin/aarch64-linux-gnu-g++"
- }
- }
- }
- ]
- }
其中的vendor_name 可以根据自己的情况进行修改,默认的算子部署后会放于customize 目录下,这里可以修改,比如改成jackwangcumt。而且单算子工程每次部署会进行覆盖,因此,这里需要注意一下。生成的p_relu_custom.cpp文件,重点的算子计算为:
- __aicore__ inline void Compute(int32_t progress)
- {
- // deque input tensors from VECIN queue
- LocalTensor<float> xLocal = inQueueX.DeQue<float>();
- LocalTensor<float> yLocal = outQueueY.AllocTensor<float>();
- LocalTensor<float> tmpTensor1 = tmpBuffer1.Get<float>();
- float inputVal = 0.0;
- Maxs(tmpTensor1, xLocal, inputVal, this->tileLength); // x >= 0 --> x
- // x < 0
- Mins(xLocal, xLocal, inputVal, this->tileLength);
- Muls(xLocal, xLocal, this->alpha, this->tileLength);
- Add(yLocal, xLocal, tmpTensor1, this->tileLength);
- outQueueY.EnQue<float>(yLocal);
- // free input tensors for reuse
- inQueueX.FreeTensor(xLocal);
- }
这里通过内置的原生算子来分别处理输入x<0和x>=0两个部分的数据处理,再通过Add将两个部分合并,得到最终的数据。在op_host目录下的p_relu_custom_tiling.h代码如下所示:
- #include "register/tilingdata_base.h"
-
- namespace optiling {
- BEGIN_TILING_DATA_DEF(TilingData)
- TILING_DATA_FIELD_DEF(uint32_t, totalLength);
- TILING_DATA_FIELD_DEF(uint32_t, tileNum);
- TILING_DATA_FIELD_DEF(float, alpha);
- END_TILING_DATA_DEF;
-
- REGISTER_TILING_DATA_CLASS(PReluCustom, TilingData)
- }
p_relu_custom.cpp 核心代码如下所示:
- #include "p_relu_custom_tiling.h"
- #include "register/op_def_registry.h"
- namespace optiling {
-
- const uint32_t BLOCK_DIM = 8;
- const uint32_t TILE_NUM = 16 ; // 这个数可能影响测试是否通过
-
- static ge::graphStatus TilingFunc(gert::TilingContext* context)
- {
-
- TilingData tiling;
- uint32_t totalLength = context->GetInputTensor(0)->GetShapeSize();
- const gert::RuntimeAttrs *attrs = context->GetAttrs();
- const float *alpha = attrs->GetAttrPointer<float>(0);
-
- context->SetBlockDim(BLOCK_DIM);
- tiling.set_totalLength(totalLength);
- tiling.set_tileNum(TILE_NUM);
- tiling.set_alpha(*alpha);
-
- tiling.SaveToBuffer(context->GetRawTilingData()->GetData(), context->GetRawTilingData()->GetCapacity());
- context->GetRawTilingData()->SetDataSize(tiling.GetDataSize());
-
- size_t *currentWorkspace = context->GetWorkspaceSizes(1);
- currentWorkspace[0] = 0;
-
- return ge::GRAPH_SUCCESS;
- }
- }
- namespace ge {
- static ge::graphStatus InferShape(gert::InferShapeContext* context)
- {
- const gert::Shape* x1_shape = context->GetInputShape(0);
- gert::Shape* y_shape = context->GetOutputShape(0);
- *y_shape = *x1_shape;
- return GRAPH_SUCCESS;
- }
- }
- namespace ops {
- class PReluCustom : public OpDef {
- public:
- explicit PReluCustom(const char* name) : OpDef(name)
- {
- this->Input("x")
- .ParamType(REQUIRED)
- .DataType({ge::DT_FLOAT})
- .Format({ge::FORMAT_ND})
- .UnknownShapeFormat({ge::FORMAT_ND});
- this->Output("y")
- .ParamType(REQUIRED)
- .DataType({ge::DT_FLOAT})
- .Format({ge::FORMAT_ND})
- .UnknownShapeFormat({ge::FORMAT_ND});
- this->Attr("alpha").AttrType(OPTIONAL).Float(0.002);
-
- this->SetInferShape(ge::InferShape);
-
- this->AICore()
- .SetTiling(optiling::TilingFunc);
- this->AICore().AddConfig("ascend310p");
-
- }
- };
-
- OP_ADD(PReluCustom);
- }
执行如下命令,编译算子工程:
root@atlas500ai:/home/kzroot/mysoft/myAscendC/PReluSample/PReluCustom# bash build.sh
Self-extractable archive "custom_opp_ubuntu_aarch64.run" successfully created. 则表明编译成功。执行如下命令进行算子部署:
PReluCustom# ./build_out/custom_opp_ubuntu_aarch64.run
基于Ascend C 自定义算子需要进行正确性验证,这里新建一个AclNNInvocation目录(可以参考官方示例中的相关内容),目录结构如下所示:
其中的gen_data.py用于生成测试的输入和输出数据,verity_result.py用于验证精度。gen_data.py内容如下所示:
- import numpy as np
- import os
-
- def gen_golden_data_simple():
- alpha = np.array(0.002, dtype=np.float32)
- input_x = np.random.uniform(-100, 100, [8, 200, 1024]).astype(np.float32)
- golden = np.where(input_x >= 0, input_x, input_x * alpha).astype(np.float32)
- os.system("mkdir -p input")
- os.system("mkdir -p output")
- input_x.tofile("./input/input_x.bin")
- golden.tofile("./output/golden.bin")
-
- if __name__ == "__main__":
- gen_golden_data_simple()
src目录下的CMakeLists.txt有一个环境变量可能需要修改,即 set(CUST_PKG_PATH "${INC_PATH}/opp/vendors/customize/op_api") ,默认是不需要修改的,他需要和vendor_name一致。执行如下命令进行测试:
PReluSample/AclNNInvocation# bash run.sh
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。