当前位置:   article > 正文

tensorflow模型量化篇(2)全整形量化及半浮点数量化、量化感知训练_tensorflow量化

tensorflow量化

1 全整形量化(Full integer quantization)

在模型转换时将权重张量以及激活张量从32位浮点数量化为8bit整数

1.1 训练一个keras模型并转换为tflite格式

#数据预处理
train_images = train_images / 255.0
test_images = test_images / 255.0
  • 1
  • 2
  • 3
#构建模型
model = keras.Sequential([
  keras.layers.InputLayer(input_shape=(28, 28)),
  keras.layers.Reshape(target_shape=(28, 28, 1)),
  keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation=tf.nn.relu),
  keras.layers.MaxPooling2D(pool_size=(2, 2)),
  keras.layers.Flatten(),
  keras.layers.Dense(10, activation=tf.nn.softmax)
])

# 编译并训练
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(
  train_images, train_labels,
  epochs=5, validation_split=0.1,
)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
Epoch 1/5
1688/1688 [==============================] - 8s 2ms/step - loss: 0.5397 - accuracy: 0.8512 - val_loss: 0.1348 - val_accuracy: 0.9643
Epoch 2/5
1688/1688 [==============================] - 4s 2ms/step - loss: 0.1416 - accuracy: 0.9593 - val_loss: 0.0937 	- val_accuracy: 0.9738
Epoch 3/5
1688/1688 [==============================] - 4s 2ms/step - loss: 0.0920 - accuracy: 0.9720 - 	val_loss: 0.0759 - val_accuracy: 0.9797
Epoch 4/5
1688/1688 [==============================] - 4s 2ms/step - loss: 0.0780 - accuracy: 0.9774 - val_loss: 0.0735 - val_accuracy: 0.9805
Epoch 5/5
1688/1688 [==============================] - 4s 3ms/step - loss: 0.0620 - accuracy: 0.9820 - val_loss: 0.0651 - val_accuracy: 0.9828
<tensorflow.python.keras.callbacks.History at 0x7fced5573490>
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
tflite_name = "tflite_model"
open(tflite_name, "wb").write(tflite_model)
  • 1
  • 2
  • 3
  • 4
83640
  • 1

1.2 使用浮点回退量化(float fallback quantization)

为了量化变量(如输入、输出以及一些中间层的数据),我们需要一个RepresentativeDataset来代表这些数据的分布特征,如最大值最小值。
可以从训练集或验证集中选取大约100-500个数据。

def representative_data_gen():
    for image in train_images[0:100,:,:]:
        yield[image.reshape(-1,train_images.shape[1],train_images.shape[2]).astype("float32")]
 
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen

tflite_model_quant = converter.convert()
#保存转换后的模型
FullInt_name = "quantify_Full.tflite"
open(FullInt_name, "wb").write(tflite_model_quant)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
23840
  • 1

使用转换后的tf lite 模式的模型进行推断查看效果:

def evaluate(interpreter_path):
    #加载模型并分配张量
    interpreter = tf.lite.Interpreter(model_path=interpreter_path)
    interpreter.allocate_tensors()

    #获得输入输出张量.
    input_details = interpreter.get_input_details()
    output_details = interpreter.get_output_details()

    import numpy as np
    index = input_details[0]['index']
    shape = input_details[0]['shape']
    acc_count = 0
    image_count = test_images.shape[0]
    for i in range(image_count):
        interpreter.set_tensor(index, test_images[i].reshape(shape).astype("float32"))
        interpreter.invoke()
        output_data = interpreter.get_tensor(output_details[0]['index'])
        label = np.argmax(output_data)
        if label == test_labels[i]:
            acc_count += 1
    print("test_images accuracy is {:.2%}".format(acc_count/(image_count)))
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
evaluate(tflite_name)
evaluate(FullInt_name)
  • 1
  • 2
test_images accuracy is 98.02%
test_images accuracy is 97.94%
  • 1
  • 2

大小从原来的83640b减小到23840b,也是大约4倍的缩减,精度下降了0.08%。

至此,模型中权重和激活值被量化为了8bit,但是为了保持兼容性,这种方式的量化里输入和输出张量仍是float32类型。
如果TensorFlow Lite没有包含某个操作的量化实现,此量化过程可能会留下浮点格式的操作,这也就是浮点回退量化的名字的原因。

1.3 仅有integer的量化(integer-only quantization)

此方法使得所有的张量都被量化为8bit,如果不能被顺利执行,就会抛出异常。
实现这种方法的步骤很简单,只需要在1.2的基础上增添几行代码即可。

def representative_data_gen():
    for image in train_images[0:100,:,:]:
        yield[image.reshape(-1,train_images.shape[1],train_images.shape[2]).astype("float32")]
 
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen

#--------新增加的代码--------------------------------------------------------
# 确保量化操作不支持时抛出异常
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
# 设置输入输出张量为uint8格式
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
#----------------------------------------------------------------------------

tflite_model_quant = converter.convert()
#保存转换后的模型
FullInt_name = "quantify_Full.tflite"
open(FullInt_name, "wb").write(tflite_model_quant)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20

这种方法的效果依据1.2中的步骤可以自行测试。

1.4 半浮点数量化(float16 quantization)

将量化方式改为float16 量化较为简单,只需要在1.2的基础上增加一行代码

def representative_data_gen():
    for image in train_images[0:100,:,:]:
        yield[image.reshape(-1,train_images.shape[1],train_images.shape[2]).astype("float32")]
 
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen

#--------增加的代码--------------------------------------------------------
converter.target_spec.supported_types = [tf.float16]
#----------------------------------------------------------------------------

tflite_model_quant = converter.convert()
#保存转换后的模型
FullInt_name = "quantify_Full.tflite"
open(FullInt_name, "wb").write(tflite_model_quant)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16

结果对比如下:

83640
43488
test_images accuracy is 98.02%
test_images accuracy is 98.02%
  • 1
  • 2
  • 3
  • 4

可以看出模型缩小为原来的1/2,而准确率没有下降。

1.5 8bit权重16bit激活(integer quantization with int16 activations)

def representative_data_gen():
    for image in train_images[0:100,:,:]:
        yield[image.reshape(-1,train_images.shape[1],train_images.shape[2]).astype("float32")]
 
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen

#--------增加的代码--------------------------------------------------------
converter.target_spec.supported_ops = [tf.lite.OpsSet.EXPERIMENTAL_TFLITE_BUILTINS_ACTIVATIONS_INT16_WEIGHTS_INT8]
#----------------------------------------------------------------------------

tflite_model_quant = converter.convert()
#保存转换后的模型
FullInt_name = "quantify_Full.tflite"
open(FullInt_name, "wb").write(tflite_model_quant)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
84684
25008
test_images accuracy is 98.02%
test_images accuracy is 98.02%
  • 1
  • 2
  • 3
  • 4

注:此方法仍在实验当中,如果报错提示没有EXPERIMENTAL_TFLITE_BUILTINS_ACTIVATIONS_INT16_WEIGHTS_INT8属性,请更新你的tensorflow 版本,实验环境此时为tensorflow == 2.4.1

2 量化感知训练

代码流程与上述流程并无太大差异,具体参考量化感知训练

章节导航

上一篇:tensorflow模型量化篇(1)量化方法及动态范围量化

下一篇:待续

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/我家小花儿/article/detail/393129
推荐阅读
相关标签
  

闽ICP备14008679号