当前位置:   article > 正文

DPU on PYNQ-Z2系列—2.2 DNNDK使用—使用decent工具量化模型






freeze_graph \
--input_graph=./float_graph/resnet50v1.pb \
--input_checkpoint=./float_graph/resnet50v1.ckpt \
--input_binary=true \
--output_graph=./resnet50v1.pb \
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6




set -e
# Please set your imagenet validation dataset path here,


python3 eval.py \
  --input_frozen_graph ./frozen_resnet50v1.pb \
  --input_node input \
  --output_node resnet_v1_50/predictions/Reshape_1
  --eval_batches $EVAL_BATCHES \
  --batch_size $BATCH_SIZE \
  --eval_image_dir $IMAGE_DIR \
  --eval_image_list $IMAGE_LIST \
  --gpu 0
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18


def eval(input_graph_def, input_node, output_node):
    """Evaluate classification network graph_def's accuracy, need evaluation dataset"""
    tf.import_graph_def(input_graph_def,name = '')

    # Get input tensors
    input_tensor = tf.get_default_graph().get_tensor_by_name(input_node+':0')
    input_labels = tf.placeholder(tf.float32,shape = [None,FLAGS.class_num])

    # Calculate accuracy
    output = tf.get_default_graph().get_tensor_by_name(output_node+':0')
    prediction = tf.reshape(output, [FLAGS.batch_size, FLAGS.class_num])
    correct_labels = tf.argmax(input_labels, 1)
    top1_prediction = tf.nn.in_top_k(prediction, correct_labels, k = 1)
    top5_prediction = tf.nn.in_top_k(prediction, correct_labels, k = 5)
    top1_accuracy = tf.reduce_mean(tf.cast(top1_prediction,'float'))
    top5_accuracy = tf.reduce_mean(tf.cast(top5_prediction,'float'))

    # Start evaluation
    print("Start Evaluation for {} Batches...".format(FLAGS.eval_batches))
    with tf.Session() as sess:
        progress = ProgressBar()
        top1_sum_acc = 0
        top5_sum_acc = 0
        for iter in progress(range(0,FLAGS.eval_batches)):
            input_data = eval_input(iter, FLAGS.eval_image_dir, FLAGS.eval_image_list, FLAGS.class_num, FLAGS.batch_size)
            images = input_data['input']
            # img = input_data['input']
            # images = np.array(img)
            labels = input_data['labels']
            feed_dict = {input_tensor: images, input_labels: labels}
            top1_acc, top5_acc = sess.run([top1_accuracy, top5_accuracy],feed_dict)
            top1_sum_acc += top1_acc
            top5_sum_acc += top5_acc
    final_top1_acc = top1_sum_acc/FLAGS.eval_batches
    final_top5_acc = top5_sum_acc/FLAGS.eval_batches
    print("Accuracy: Top1: {}, Top5: {}".format(final_top1_acc, final_top5_acc))
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36

input_data = eval_input(iter, FLAGS.eval_image_dir, FLAGS.eval_image_list, FLAGS.class_num, FLAGS.batch_size)

def eval_input(iter, eval_image_dir, eval_image_list, class_num, eval_batch_size):
    images = []
    labels = []
    line = open(eval_image_list).readlines()
    for index in range(0, eval_batch_size):
        curline = line[iter * eval_batch_size + index]
        [image_name, label_id] = curline.split(' ')
        image = cv2.imread(eval_image_dir + image_name)
        image = preprocess(image)
        lb = preprocessing.LabelBinarizer()
    lb.fit(range(0, class_num))
    labels = lb.transform(labels)
    return {"input": images, "labels": labels}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15


def preprocess(img):
    img = np.array(img, dtype=np.float32)
    height, width, _ = img.shape
    new_height = height * 256 // min(img.shape[:2])
    new_width = width * 256 // min(img.shape[:2])
    img = cv2.resize(img, (new_width, new_height), interpolation=cv2.INTER_CUBIC)

    height, width, _ = img.shape
    startx = width//2 - (224//2)
    starty = height//2 - (224//2)
    img = img[starty:starty+224,startx:startx+224]
    assert img.shape[0] == 224 and img.shape[1] == 224, (img.shape, height, width)

    img[:,:,0] -= 123.68
    img[:,:,1] -= 116.779
    img[:,:,2] -= 103.939 
    # Resize
    return img
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19


  • cv2.imread读取进来的直接是BGR格式,无需再做RGB2BGR
  • cv2.imread读取进来的图像数据格式是uint8,需要转换成float32再减去均值

然后执行evaluate_frozen.sh结果如下,准确率Top1: 0.7355, Top5: 0.9147,说明我们的预处理是没有问题的

root@3f231e40c7cd:/mnt/nvidia/host_x86/models/tensorflow/resnet50# sh evaluate_frozen.sh
WARNING:tensorflow:From eval.py:55: FastGFile.__init__ (from tensorflow.python.platform.gfile) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.gfile.GFile.
Start Evaluation for 1000 Batches...
2020-03-08 17:18:32.296229: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:04:00.0
totalMemory: 10.76GiB freeMemory: 10.60GiB
2020-03-08 17:18:32.296285: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2020-03-08 17:18:32.668404: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-08 17:18:32.668467: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0
2020-03-08 17:18:32.668478: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N
2020-03-08 17:18:32.668634: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10232 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:04:00.0, compute capability: 7.5)
100% |###################################################################################################################################################################################################|
Accuracy: Top1: 0.6144000029563904, Top5: 0.8405999964475632
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16


decent_q quantize \
  --input_frozen_graph ./frozen_resnet50v1.pb \
  --input_nodes input \
  --input_shapes ?,224,224,3 \
  --output_nodes resnet_v1_50/predictions/Reshape_1 \
  --input_fn input_fn.calib_input \
  --method 1 \
  --gpu 0 \
  --calib_iter 27 \
  --output_dir ./quantize_results \
  --weight_bit 8 \
  --activation_bit 8
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12


  • 准备dcf文件
    在Vivado中集成DPU IP一节中我们提到要保存hwh文件,在dnndk中调用dlet生成dcf文件。
dlet -f pynq_dpu.hwh
[DLet]Generate DPU DCF file dpu-11111530-111530-201911111530-1530-30.dcf successfully.
  • 1
  • 2
  • 编译
dnnc --parser=tensorflow                         \
       --frozen_pb=./quantize_results/deploy_model.pb   \
       --output_dir=dnnc_output                 \
       --dcf=pynqz2.dcf                         \
       --mode=normal                        \
       --cpu_arch=arm32                     \
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7


[DNNC][Warning] layer [resnet_v1_50_SpatialSqueeze] (type: Squeeze) is not supported in DPU, deploy it in CPU instead.
[DNNC][Warning] layer [resnet_v1_50_predictions_Softmax] (type: Softmax) is not supported in DPU, deploy it in CPU instead.

DNNC Kernel topology "resnet50v1_kernel_graph.jpg" for network "resnet50v1"
DNNC kernel list info for network "resnet50v1"
                               Kernel ID : Name
                                       0 : resnet50v1_0
                                       1 : resnet50v1_1

                             Kernel Name : resnet50v1_0
                             Kernel Type : DPUKernel
                               Code Size : 0.99MB
                              Param Size : 24.35MB
                           Workload MACs : 6964.51MOPS
                         IO Memory Space : 2.25MB
                              Mean Value : 0, 0, 0,
                              Node Count : 58
                            Tensor Count : 59
                    Input Node(s)(H*W*C)
            resnet_v1_50_conv1_Conv2D(0) : 224*224*3
                   Output Node(s)(H*W*C)
           resnet_v1_50_logits_Conv2D(0) : 1*1*1000

                             Kernel Name : resnet50v1_1
                             Kernel Type : CPUKernel
                    Input Node(s)(H*W*C)
             resnet_v1_50_SpatialSqueeze : 1*1*1000
                   Output Node(s)(H*W*C)
        resnet_v1_50_predictions_Softmax : 1*1*1000
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32


