赞
踩
本篇博文以dnndk提供的resnet50模型为例介绍如何使用decent工具对模型进行量化
freeze_graph \
--input_graph=./float_graph/resnet50v1.pb \
--input_checkpoint=./float_graph/resnet50v1.ckpt \
--input_binary=true \
--output_graph=./resnet50v1.pb \
--output_node_name=resnet_v1_50/predictions/Reshape_1
这里的pb和ckpt文件是dnndk提供的,input_node和output_node的名称是在定义模型时确定的,如果是自定义模型要根据定义修改。
evaluate_frozen_graph.sh内容如下:
#!/bin/sh set -e # Please set your imagenet validation dataset path here, IMAGE_DIR=/media/DATASET/imagenet2012/val/ IMAGE_LIST=/media/DATASET/imagenet2012/val.txt EVAL_BATCHES=1000 BATCH_SIZE=50 python3 eval.py \ --input_frozen_graph ./frozen_resnet50v1.pb \ --input_node input \ --output_node resnet_v1_50/predictions/Reshape_1 --eval_batches $EVAL_BATCHES \ --batch_size $BATCH_SIZE \ --eval_image_dir $IMAGE_DIR \ --eval_image_list $IMAGE_LIST \ --gpu 0
在这里,用到了eval.py,主要内容如下
def eval(input_graph_def, input_node, output_node): """Evaluate classification network graph_def's accuracy, need evaluation dataset""" tf.import_graph_def(input_graph_def,name = '') # Get input tensors input_tensor = tf.get_default_graph().get_tensor_by_name(input_node+':0') input_labels = tf.placeholder(tf.float32,shape = [None,FLAGS.class_num]) # Calculate accuracy output = tf.get_default_graph().get_tensor_by_name(output_node+':0') prediction = tf.reshape(output, [FLAGS.batch_size, FLAGS.class_num]) correct_labels = tf.argmax(input_labels, 1) top1_prediction = tf.nn.in_top_k(prediction, correct_labels, k = 1) top5_prediction = tf.nn.in_top_k(prediction, correct_labels, k = 5) top1_accuracy = tf.reduce_mean(tf.cast(top1_prediction,'float')) top5_accuracy = tf.reduce_mean(tf.cast(top5_prediction,'float')) # Start evaluation print("Start Evaluation for {} Batches...".format(FLAGS.eval_batches)) with tf.Session() as sess: progress = ProgressBar() top1_sum_acc = 0 top5_sum_acc = 0 for iter in progress(range(0,FLAGS.eval_batches)): input_data = eval_input(iter, FLAGS.eval_image_dir, FLAGS.eval_image_list, FLAGS.class_num, FLAGS.batch_size) images = input_data['input'] # img = input_data['input'] # images = np.array(img) labels = input_data['labels'] feed_dict = {input_tensor: images, input_labels: labels} top1_acc, top5_acc = sess.run([top1_accuracy, top5_accuracy],feed_dict) top1_sum_acc += top1_acc top5_sum_acc += top5_acc final_top1_acc = top1_sum_acc/FLAGS.eval_batches final_top5_acc = top5_sum_acc/FLAGS.eval_batches print("Accuracy: Top1: {}, Top5: {}".format(final_top1_acc, final_top5_acc))
其中大部分内容都是固定下来无需做任何改动,只有
input_data = eval_input(iter, FLAGS.eval_image_dir, FLAGS.eval_image_list, FLAGS.class_num, FLAGS.batch_size)
这一行需要改动。这一行的作用是把图片以及相应的label信息读取进来,经过图像预处理成Tensor,并且返回相应的label。eval_input内容如下
def eval_input(iter, eval_image_dir, eval_image_list, class_num, eval_batch_size):
images = []
labels = []
line = open(eval_image_list).readlines()
for index in range(0, eval_batch_size):
curline = line[iter * eval_batch_size + index]
[image_name, label_id] = curline.split(' ')
image = cv2.imread(eval_image_dir + image_name)
image = preprocess(image)
images.append(image)
labels.append(int(label_id))
lb = preprocessing.LabelBinarizer()
lb.fit(range(0, class_num))
labels = lb.transform(labels)
return {"input": images, "labels": labels}
这里边最关键的内容是preprocess(image)
,
我们在做模型验证、量化时要保证这里用的图像预处理与模型训练时用的预处理是一致的
我们在做模型验证、量化时要保证这里用的图像预处理与模型训练时用的预处理是一致的
我们在做模型验证、量化时要保证这里用的图像预处理与模型训练时用的预处理是一致的
重要的事情说三遍。从dnndk提供的脚本来看resnet50v1的预处理是对RGB三个通道分别减去103.939,117.779,123.68,并且将RGB转换成BGR。于是在eval_input里的preprocess函数也应该执行相同的操作,内容如下:
def preprocess(img): img = np.array(img, dtype=np.float32) height, width, _ = img.shape new_height = height * 256 // min(img.shape[:2]) new_width = width * 256 // min(img.shape[:2]) img = cv2.resize(img, (new_width, new_height), interpolation=cv2.INTER_CUBIC) height, width, _ = img.shape startx = width//2 - (224//2) starty = height//2 - (224//2) img = img[starty:starty+224,startx:startx+224] assert img.shape[0] == 224 and img.shape[1] == 224, (img.shape, height, width) img[:,:,0] -= 123.68 img[:,:,1] -= 116.779 img[:,:,2] -= 103.939 # Resize return img
需要注意两点
然后执行evaluate_frozen.sh结果如下,准确率Top1: 0.7355, Top5: 0.9147,说明我们的预处理是没有问题的
root@3f231e40c7cd:/mnt/nvidia/host_x86/models/tensorflow/resnet50# sh evaluate_frozen.sh WARNING:tensorflow:From eval.py:55: FastGFile.__init__ (from tensorflow.python.platform.gfile) is deprecated and will be removed in a future version. Instructions for updating: Use tf.gfile.GFile. Start Evaluation for 1000 Batches... 2020-03-08 17:18:32.296229: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545 pciBusID: 0000:04:00.0 totalMemory: 10.76GiB freeMemory: 10.60GiB 2020-03-08 17:18:32.296285: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0 2020-03-08 17:18:32.668404: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-03-08 17:18:32.668467: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 2020-03-08 17:18:32.668478: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N 2020-03-08 17:18:32.668634: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10232 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:04:00.0, compute capability: 7.5) 100% |###################################################################################################################################################################################################| Accuracy: Top1: 0.6144000029563904, Top5: 0.8405999964475632
decent_q quantize \
--input_frozen_graph ./frozen_resnet50v1.pb \
--input_nodes input \
--input_shapes ?,224,224,3 \
--output_nodes resnet_v1_50/predictions/Reshape_1 \
--input_fn input_fn.calib_input \
--method 1 \
--gpu 0 \
--calib_iter 27 \
--output_dir ./quantize_results \
--weight_bit 8 \
--activation_bit 8
dlet -f pynq_dpu.hwh
[DLet]Generate DPU DCF file dpu-11111530-111530-201911111530-1530-30.dcf successfully.
dnnc --parser=tensorflow \
--frozen_pb=./quantize_results/deploy_model.pb \
--output_dir=dnnc_output \
--dcf=pynqz2.dcf \
--mode=normal \
--cpu_arch=arm32 \
--net_name=resnet50v1
等待一段时间我们可以看到下面的结果
[DNNC][Warning] layer [resnet_v1_50_SpatialSqueeze] (type: Squeeze) is not supported in DPU, deploy it in CPU instead. [DNNC][Warning] layer [resnet_v1_50_predictions_Softmax] (type: Softmax) is not supported in DPU, deploy it in CPU instead. DNNC Kernel topology "resnet50v1_kernel_graph.jpg" for network "resnet50v1" DNNC kernel list info for network "resnet50v1" Kernel ID : Name 0 : resnet50v1_0 1 : resnet50v1_1 Kernel Name : resnet50v1_0 -------------------------------------------------------------------------------- Kernel Type : DPUKernel Code Size : 0.99MB Param Size : 24.35MB Workload MACs : 6964.51MOPS IO Memory Space : 2.25MB Mean Value : 0, 0, 0, Node Count : 58 Tensor Count : 59 Input Node(s)(H*W*C) resnet_v1_50_conv1_Conv2D(0) : 224*224*3 Output Node(s)(H*W*C) resnet_v1_50_logits_Conv2D(0) : 1*1*1000 Kernel Name : resnet50v1_1 -------------------------------------------------------------------------------- Kernel Type : CPUKernel Input Node(s)(H*W*C) resnet_v1_50_SpatialSqueeze : 1*1*1000 Output Node(s)(H*W*C) resnet_v1_50_predictions_Softmax : 1*1*1000
需要解释一下为什么产生了两个kernel,却只生成了一个elf文件。在ResNet50v1网络中,从输入resnet_v1_50_conv1_Conv2D到resnet_v1_50_logits_Conv2D节点,都是放在dpu上计算的,但是后边的squeeze和softmax操作dpu不支持,就需要我们把数据从resnet_v1_50_logits_Conv2D节点拿出来再手动写squeeze和softmax的功能。不过我们在这里做的只是分类,并不需要把softmax结果计算出来,让dpu计算到resnet_v1_50_logits_Conv2D,对结果直接排序就可以得到分类的结果了。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。