引入
为了提高程序的可复用性,我们采用函数里定义函数的操作(有些类似面向对象),这样可以简单的定义出较多的层级结构。
并且我们本次要做的是带有BN结构的CNN程序。我们可以把BN操作看成一个放在激活函数操作之后的一个功能单元或层级结构,这样我们可以避免每次都定义一些重复的变量。
BN的操作大体如下:
-
对于每一个输入实例 x k x_k xk,我们进行如下变换:
x ^ ( k ) = x ( k ) − E [ x ( k ) ] Var [ x ( k ) ] \hat{x}^{(k)}=\frac{x^{(k)}-E\left[x^{(k)}\right]}{\sqrt{\operatorname{Var}\left[x^{(k)}\right]}} x^(k)=Var[x(k)]x(k)−E[x(k)] -
继续添加参数,用以增强网络表示能力:
y ( k ) = γ ( k ) x ^ ( k ) + β ( k ) y^{(k)}=\gamma^{(k)} \hat{x}^{(k)}+\beta^{(k)} y(k)=γ(k)x^(k)+β(k)
数据集的导入:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True, reshape=False)
具体实现
首先,我们要先考虑我们的层级结构可以抽象出哪些类来。显然,层级结构可分为两个大类:
- 全连接层
- 具有卷积操作的层级
所以我们可以将这两大类分别定义为相应的函数,函数内部再定义出相关操作。
其次,我们要清楚BN操作的API,主要包括以下两个:
- tf.nn.moments(x, axes, name=None, keep_dims=False)
该函数主要求输入x的均值与方差,返回两个tensor,分别为mean, variance。 - tf.nn.batch_normalization(x, mean, variance, offset, scale, variance_epsilon, name=None)
该函数接收参数较多,必要的参数有6个。mean, variance分别指的是均值、方差, offset, scale分别指的是上面公式的 γ 、 β \gamma 、\beta γ、β。variance_epsilon指的是令除数不为零的一个极小值。
值得注意的是为了令数据具有一定健壮性,在每次前向传播时所以我们采用滑动平均法计算新的均值和方差。
全连接层定义如下:
def fully_connected(prev_layer, num_units, is_training):
"""
num_units参数传递该层神经元的数量,根据prev_layer参数传入值作为该层输入创建全连接神经网络。
:param prev_layer: Tensor
该层神经元输入
:param num_units: int
该层神经元结点个数
:param is_training: bool or Tensor
表示该网络当前是否正在训练,告知Batch Normalization层是否应该更新或者使用均值或方差的分布信息
:returns Tensor
一个新的全连接神经网络层
"""
layer = tf.layers.dense(prev_layer, num_units, use_bias=False, activation=None)
# gamma一般初始化为1
gamma = tf.Variable(tf.ones([num_units]))
# beta一般初始化为0
beta = tf.Variable(tf.zeros([num_units]))
# 要知道BN在train和test时所用的均值、方差是不一样的
pop_mean = tf.Variable(tf.zeros([num_units]), trainable=False)
pop_variance = tf.Variable(tf.ones([num_units]), trainable=False)
epsilon = 1e-3
def batch_norm_training():
batch_mean, batch_variance = tf.nn.moments(layer, [0])
# 采用滑动平均法计算新的均值和方差
decay = 0.99 # 滑动平均的衰减系数
train_mean = tf.assign(pop_mean, pop_mean*decay + batch_mean*(1 - decay))
# pop_mean*decay + batch_mean*(1 - decay) -> pop_mean -> train_mean
train_variance = tf.assign(pop_variance, pop_variance*decay + batch_variance*(1 - decay))
with tf.control_dependencies([train_mean, train_variance]):
return tf.nn.batch_normalization(layer, batch_mean, batch_variance, beta, gamma, epsilon)
def batch_norm_inference():
return tf.nn.batch_normalization(layer, pop_mean, pop_variance, beta, gamma, epsilon)
batch_normalized_output = tf.cond(is_training, batch_norm_training, batch_norm_inference)
# 如果是训练则进行batch_norm_training运算,否则进行batch_norm_inference
return tf.nn.relu(batch_normalized_output)
我们对conv_layer卷积层的改变和我们对fully_connected全连接层的改变几乎差不多。然而也有很大的区别,卷积层有多个特征图并且每个特征图在输入图层上共享权重。所以我们需要确保应该针对每个特征图而不是卷积层上的每个节点进行Batch Normalization操作。
为了实现这一点,我们做了与fully_connected相同的事情,有两个例外:
- 将gamma、beta、pop_mean和pop_方差的大小设置为feature map(输出通道)的数量,而不是输出节点的数量。
- 我们改变传递给tf.nn的参数。时刻确保它计算正确维度的均值和方差。
卷积层定义如下:
def conv_layer(prev_layer, layer_depth, is_training):
"""
使用给定的参数作为输入创建卷积层
:param prev_layer: Tensor
传入该层神经元作为输入
:param layer_depth: int
我们将根据网络中图层的深度设置特征图的步长和数量。
这不是实践CNN的好方法,但它可以帮助我们用很少的代码创建这个示例。
:param is_training: bool or Tensor
表示该网络当前是否正在训练,告知Batch Normalization层是否应该更新或者使用均值或方差的分布信息
:returns Tensor
一个新的卷积层
"""
strides = 2 if layer_depth % 3 == 0 else 1
in_channels = prev_layer.get_shape().as_list()[3]
out_channels = layer_depth*4
weights = tf.Variable(tf.truncated_normal([3, 3, in_channels, out_channels], stddev=0.05))
layer = tf.nn.conv2d(prev_layer, weights, strides=[1, strides, strides, 1], padding='SAME')
gamma = tf.Variable(tf.ones([out_channels]))
beta = tf.Variable(tf.zeros([out_channels]))
pop_mean = tf.Variable(tf.zeros([out_channels]), trainable=False)
pop_variance = tf.Variable(tf.ones([out_channels]), trainable=False)
epsilon = 1e-3
def batch_norm_training():
# 一定要使用正确的维度确保计算的是每个特征图上的平均值和方差而不是整个网络节点上的统计分布值
batch_mean, batch_variance = tf.nn.moments(layer, [0, 1, 2], keep_dims=False)
decay = 0.99 # 衰退
train_mean = tf.assign(pop_mean, pop_mean*decay + batch_mean*(1 - decay))
train_variance = tf.assign(pop_variance, pop_variance*decay + batch_variance*(1 - decay))
with tf.control_dependencies([train_mean, train_variance]):
return tf.nn.batch_normalization(layer, batch_mean, batch_variance, beta, gamma, epsilon)
def batch_norm_inference():
return tf.nn.batch_normalization(layer, pop_mean, pop_variance, beta, gamma, epsilon)
batch_normalized_output = tf.cond(is_training, batch_norm_training, batch_norm_inference)
return tf.nn.relu(batch_normalized_output)
训练与测试函数如下:
def train(num_batches, batch_size, learning_rate):
# Build placeholders for the input samples and labels
# 创建输入样本和标签的占位符
inputs = tf.placeholder(tf.float32, [None, 28, 28, 1])
labels = tf.placeholder(tf.float32, [None, 10])
# Add placeholder to indicate whether or not we're training the model
# 创建占位符表明当前是否正在训练模型
is_training = tf.placeholder(tf.bool)
# Feed the inputs into a series of 20 convolutional layers
# 把输入数据填充到一系列20个卷积层的神经网络中
layer = inputs
for layer_i in range(1, 20):
layer = conv_layer(layer, layer_i, is_training)
# Flatten the output from the convolutional layers
# 将卷积层输出扁平化处理
orig_shape = layer.get_shape().as_list()
layer = tf.reshape(layer, shape=[-1, orig_shape[1]*orig_shape[2]*orig_shape[3]])
# Add one fully connected layer
# 添加一个具有100个神经元的全连接层
layer = fully_connected(layer, 100, is_training)
print(layer)
# Create the output layer with 1 node for each
# 为每一个类别添加一个输出节点
logits = tf.layers.dense(layer, 10, use_bias=False, activation=None)
print(logits)
# Define loss and training operations
# 定义loss 函数和训练操作
model_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=labels))
train_opt = tf.train.AdamOptimizer(learning_rate).minimize(model_loss)
# Create operations to test accuracy
# 创建计算准确度的操作
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# Train and test the network
# 训练并测试网络模型
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for batch_i in range(num_batches):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
# train this batch
# 训练样本批次
sess.run(train_opt, {inputs: batch_xs, labels: batch_ys, is_training: True})
# Periodically check the validation or training loss and accuracy
# 定期检查训练或验证集上的loss和精确度
if batch_i%100 == 0:
loss, acc = sess.run([model_loss, accuracy], {inputs: mnist.validation.images,
labels: mnist.validation.labels,
is_training: False})
print(
'Batch: {:>2}: Validation loss: {:>3.5f}, Validation accuracy: {:>3.5f}'.format(batch_i, loss, acc))
elif batch_i%25 == 0:
loss, acc = sess.run([model_loss, accuracy], {inputs: batch_xs, labels: batch_ys, is_training: False})
print('Batch: {:>2}: Training loss: {:>3.5f}, Training accuracy: {:>3.5f}'.format(batch_i, loss, acc))
# At the end, score the final accuracy for both the validation and test sets
# 最后在验证集和测试集上对模型准确率进行评分
acc = sess.run(accuracy, {inputs: mnist.validation.images,
labels: mnist.validation.labels,
is_training: False})
print('Final validation accuracy: {:>3.5f}'.format(acc))
acc = sess.run(accuracy, {inputs: mnist.test.images,
labels: mnist.test.labels,
is_training: False})
print('Final test accuracy: {:>3.5f}'.format(acc))
# Score the first 100 test images individually, just to make sure batch normalization really worked
# 对100个独立的测试图片进行评分,对比验证Batch Normalization的效果
correct = 0
for i in range(100):
correct += sess.run(accuracy, feed_dict={inputs: [mnist.test.images[i]],
labels: [mnist.test.labels[i]],
is_training: False})
print("Accuracy on 100 samples:", correct/100)
之后,开始运行程序:
num_batches = 800 # 迭代次数
batch_size = 64 # 批处理数量
learning_rate = 0.002 # 学习率
tf.reset_default_graph()
with tf.Graph().as_default():
train(num_batches, batch_size, learning_rate)
最终输出结果如下:
Batch: 0: Validation loss: 0.69160, Validation accuracy: 0.11260
Batch: 25: Training loss: 0.59628, Training accuracy: 0.12500
Batch: 50: Training loss: 0.47628, Training accuracy: 0.07812
Batch: 75: Training loss: 0.40237, Training accuracy: 0.10938
Batch: 100: Validation loss: 0.36219, Validation accuracy: 0.09860
Batch: 125: Training loss: 0.34905, Training accuracy: 0.01562
Batch: 150: Training loss: 0.33883, Training accuracy: 0.10938
Batch: 175: Training loss: 0.33278, Training accuracy: 0.10938
Batch: 200: Validation loss: 0.34536, Validation accuracy: 0.08680
Batch: 225: Training loss: 0.34430, Training accuracy: 0.07812
Batch: 250: Training loss: 0.41880, Training accuracy: 0.07812
Batch: 275: Training loss: 0.54379, Training accuracy: 0.06250
Batch: 300: Validation loss: 0.46161, Validation accuracy: 0.09320
Batch: 325: Training loss: 0.44883, Training accuracy: 0.10938
Batch: 350: Training loss: 0.33158, Training accuracy: 0.32812
Batch: 375: Training loss: 0.34692, Training accuracy: 0.43750
Batch: 400: Validation loss: 0.24509, Validation accuracy: 0.55980
Batch: 425: Training loss: 0.27850, Training accuracy: 0.53125
Batch: 450: Training loss: 0.05963, Training accuracy: 0.87500
Batch: 475: Training loss: 0.12804, Training accuracy: 0.81250
Batch: 500: Validation loss: 0.12635, Validation accuracy: 0.82320
Batch: 525: Training loss: 0.00641, Training accuracy: 0.98438
Batch: 550: Training loss: 0.09193, Training accuracy: 0.87500
Batch: 575: Training loss: 0.02719, Training accuracy: 0.95312
Batch: 600: Validation loss: 0.02826, Validation accuracy: 0.96120
Batch: 625: Training loss: 0.04134, Training accuracy: 0.93750
Batch: 650: Training loss: 0.12194, Training accuracy: 0.87500
Batch: 675: Training loss: 0.01194, Training accuracy: 0.98438
Batch: 700: Validation loss: 0.02989, Validation accuracy: 0.95560
Batch: 725: Training loss: 0.03655, Training accuracy: 0.95312
Batch: 750: Training loss: 0.01662, Training accuracy: 0.96875
Batch: 775: Training loss: 0.02223, Training accuracy: 0.96875
Final validation accuracy: 0.95920
Final test accuracy: 0.95700
Accuracy on 100 samples: 0.97
网络收敛很快!