赞
踩
https://github.com/NVIDIA/FasterTransformer/tree/release/v1.0_tag
环境准备
下边的操作有点麻烦,这个链接给出了dockerfile https://github.com/NVIDIA/TensorRT/tree/release/5.1/docker
sudo docker run -it --shm-size 8gb --rm --gpus=all -v ${PWD}:/test nvcr.io/nvidia/cuda:10.0-cudnn7-devel-ubuntu16.04 bash
cd /test/FastT
git clone https://github.com/NVIDIA/FasterTransformer.git
apt-get update
apt-get install git
下载 FasterTransformer-release-v1.0_tag.zip git clone -b release/v1.0_tag --depth=1 https://github.com/NVIDIA/FasterTransformer.git
apt-get install zip
unzip FasterTransformer-release-v1.0_tag.zip
git submodule init && git submodule update
cmake -DSM=86 -DCMAKE_BUILD_TYPE=Release -DBUILD_PYT=ON -DBUILD_MULTI_GPU=ON ..
or cmake -DSM=xx -DCMAKE_BUILD_TYPE=Release .. # C++ only
CMake Error at CMakeLists.txt:109 (add_subdirectory):
The source directory
/test/FastT/code/FasterTransformer/tools
does not contain a CMakeLists.txt file.
这个错误通常是由于您在使用`-ccbin`选项时指定了一个使用了不同的C++标准库的编译器而引起的。
加入`-Xcompiler -stdlib=libstdc++`选项可以明确告诉`nvcc`使用`libstdc++`标准库。例如:
nvcc -ccbin g++-7 -Xcompiler -stdlib=libstdc++ your_file.cu -o your_executable
可以在`-ccbin`选项中指定您需要使用的C++编译器,然后在`-Xcompiler`选项中指定需要使用的标准库,以解决此问题。如果您不知道应该使用哪个编译器和标准库,请参考您的CUDA安装说明文档或咨询CUDA社区。
./bin/gemm_fp16 100 12 32 64
# 在路径构建下生成gemm_config.in文件,以选择gemm算法以获得最佳性能。
c++ demos: ./build/bin/transformer_fp16 100 12 32 64
<batch_size> <num_layerse> <seq_len> <head_num> <size_per_head>
$ 1. sample/tensorflow/transformer_fp32.py: transformer_layer Tensorflow FP32 OP call, time measurement, timeline generation
$ 2. sample/tensorflow/transformer_fp16.py: transformer_layer Tensorflow FP16 OP call, time measurement, timeline generation
$ 3. sample/tensorflow/error_check.py: how to catch custom OP runtime errors
$ 4. sample/cpp/transformer_fp32.cc: transformer layer C++ FP32 sample
$ 5. sample/cpp/transformer_fp16.cc: transformer layer C++ FP16 sample
$ 6. sample/tensorRT/transformer_trt.cc: transformer layer tensorRT FP32/FP16 sample
$ 7. tools/gemm_test/gemm_fp16.cu: loop over all cublas FP16 GEMM algorithms and pick the best one
$ 8. tools/gemm_test/gemm_fp32.cu: loop over all cublas FP32 GEMM algorithms and pick the best one
https://developer.nvidia.com/cuda-example
https://on-demand.gputechconf.com/gtc-cn/2019/pdf/CN9468/presentation.pdf
https://github.com/prabhuomkar/pytorch-cpp/blob/master/tutorials/basics/pytorch_basics/CMakeLists.txt
Accelerate Transformer inference on GPU with Optimum and Better Transformer
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。