批量归一化实践
06 Apr 2018 | deep-learning |批量归一化 – 实践
批量归一化在深度神经网络中很有用。为了证明这一点,我们将会创建一个20层卷积层和一个全连接层的卷积神经网络。我们所使用的数据集是 MNIST
数据集。
需要注意的是,这对MNIST
数据集并不是一个好的分类神经网络,你可以直接设计一个更简单更好的网络。我们为了显示出批量归一化的好处,而做出了一些特定的处理:
- 网络足够复杂,使用批量归一化将会有更明显的好处。
- 网络训练简单,易于使用批量归一化。
- 网络架构简单易于实现。
我们会使用两个版本的批量归一化,一个进行了更多的封装操作,一个只是具有简单的功能,你可以在之前的文章中看他们的区别 。
- Batch Normalization with
tf.layers.batch_normalization
- Batch Normalization with
tf.nn.batch_normalization
首先我们需要去下载数据集。
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True, reshape=False)
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
使用 tf.layers.batch_normalization
实现归一化
tf.layers
版本的使用更加常见,tf.layers.batch_normalization
我们将会使用下面的函数来实现全链接层,现在还没有使用归一化操作。
"""
DO NOT MODIFY THIS CELL
"""
def fully_connected(prev_layer, num_units):
"""
Create a fully connectd layer with the given layer as input and the given number of neurons.
:param prev_layer: Tensor
The Tensor that acts as input into this layer
:param num_units: int
The size of the layer. That is, the number of units, nodes, or neurons.
:returns Tensor
A new fully connected layer
"""
layer = tf.layers.dense(prev_layer, num_units, activation=tf.nn.relu)
return layer
我们将会使用下面的函数实现卷积层,现在还没有使用归一化操作。
"""
DO NOT MODIFY THIS CELL
"""
def conv_layer(prev_layer, layer_depth):
"""
Create a convolutional layer with the given layer as input.
:param prev_layer: Tensor
The Tensor that acts as input into this layer
:param layer_depth: int
We'll set the strides and number of feature maps based on the layer's depth in the network.
This is *not* a good way to make a CNN, but it helps us create this example with very little code.
:returns Tensor
A new convolutional layer
"""
strides = 2 if layer_depth % 3 == 0 else 1
conv_layer = tf.layers.conv2d(prev_layer, layer_depth*4, 3, strides, 'same', activation=tf.nn.relu)
return conv_layer
下面的函数将会构建一个网络并进行训练,此时还没有进行归一化。
"""
DO NOT MODIFY THIS CELL
"""
def train(num_batches, batch_size, learning_rate):
# Build placeholders for the input samples and labels
inputs = tf.placeholder(tf.float32, [None, 28, 28, 1])
labels = tf.placeholder(tf.float32, [None, 10])
# Feed the inputs into a series of 20 convolutional layers
layer = inputs
for layer_i in range(1, 20):
layer = conv_layer(layer, layer_i)
# Flatten the output from the convolutional layers
orig_shape = layer.get_shape().as_list()
layer = tf.reshape(layer, shape=[-1, orig_shape[1] * orig_shape[2] * orig_shape[3]])
# Add one fully connected layer
layer = fully_connected(layer, 100)
# Create the output layer with 1 node for each
logits = tf.layers.dense(layer, 10)
# Define loss and training operations
model_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=labels))
train_opt = tf.train.AdamOptimizer(learning_rate).minimize(model_loss)
# Create operations to test accuracy
correct_prediction = tf.equal(tf.argmax(logits,1), tf.argmax(labels,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# Train and test the network
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for batch_i in range(num_batches):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
# train this batch
sess.run(train_opt, {inputs: batch_xs, labels: batch_ys})
# Periodically check the validation or training loss and accuracy
if batch_i % 100 == 0:
loss, acc = sess.run([model_loss, accuracy], {inputs: mnist.validation.images,
labels: mnist.validation.labels})
print('Batch: {:>2}: Validation loss: {:>3.5f}, Validation accuracy: {:>3.5f}'.format(batch_i, loss, acc))
elif batch_i % 25 == 0:
loss, acc = sess.run([model_loss, accuracy], {inputs: batch_xs, labels: batch_ys})
print('Batch: {:>2}: Training loss: {:>3.5f}, Training accuracy: {:>3.5f}'.format(batch_i, loss, acc))
# At the end, score the final accuracy for both the validation and test sets
acc = sess.run(accuracy, {inputs: mnist.validation.images,
labels: mnist.validation.labels})
print('Final validation accuracy: {:>3.5f}'.format(acc))
acc = sess.run(accuracy, {inputs: mnist.test.images,
labels: mnist.test.labels})
print('Final test accuracy: {:>3.5f}'.format(acc))
# Score the first 100 test images individually. This won't work if batch normalization isn't implemented correctly.
correct = 0
for i in range(100):
correct += sess.run(accuracy,feed_dict={inputs: [mnist.test.images[i]],
labels: [mnist.test.labels[i]]})
print("Accuracy on 100 samples:", correct/100)
num_batches = 800
batch_size = 64
learning_rate = 0.002
tf.reset_default_graph()
with tf.Graph().as_default():
train(num_batches, batch_size, learning_rate)
Batch: 0: Validation loss: 0.69062, Validation accuracy: 0.09860
Batch: 25: Training loss: 0.35044, Training accuracy: 0.07812
Batch: 50: Training loss: 0.32532, Training accuracy: 0.10938
Batch: 75: Training loss: 0.32628, Training accuracy: 0.06250
Batch: 100: Validation loss: 0.32518, Validation accuracy: 0.10700
Batch: 125: Training loss: 0.32709, Training accuracy: 0.03125
Batch: 150: Training loss: 0.32582, Training accuracy: 0.09375
Batch: 175: Training loss: 0.32305, Training accuracy: 0.21875
Batch: 200: Validation loss: 0.32584, Validation accuracy: 0.09860
Batch: 225: Training loss: 0.32501, Training accuracy: 0.15625
Batch: 250: Training loss: 0.32705, Training accuracy: 0.07812
Batch: 275: Training loss: 0.32302, Training accuracy: 0.12500
Batch: 300: Validation loss: 0.32591, Validation accuracy: 0.08680
Batch: 325: Training loss: 0.32264, Training accuracy: 0.15625
Batch: 350: Training loss: 0.32369, Training accuracy: 0.12500
Batch: 375: Training loss: 0.32658, Training accuracy: 0.09375
Batch: 400: Validation loss: 0.32506, Validation accuracy: 0.11260
Batch: 425: Training loss: 0.32533, Training accuracy: 0.09375
Batch: 450: Training loss: 0.32705, Training accuracy: 0.10938
Batch: 475: Training loss: 0.32655, Training accuracy: 0.04688
Batch: 500: Validation loss: 0.32510, Validation accuracy: 0.10020
Batch: 525: Training loss: 0.33035, Training accuracy: 0.01562
Batch: 550: Training loss: 0.32439, Training accuracy: 0.09375
Batch: 575: Training loss: 0.32113, Training accuracy: 0.20312
Batch: 600: Validation loss: 0.32499, Validation accuracy: 0.11260
Batch: 625: Training loss: 0.32403, Training accuracy: 0.14062
Batch: 650: Training loss: 0.33020, Training accuracy: 0.03125
Batch: 675: Training loss: 0.31972, Training accuracy: 0.12500
Batch: 700: Validation loss: 0.29751, Validation accuracy: 0.20460
Batch: 725: Training loss: 0.34273, Training accuracy: 0.06250
Batch: 750: Training loss: 0.31551, Training accuracy: 0.28125
Batch: 775: Training loss: 0.28567, Training accuracy: 0.25000
Final validation accuracy: 0.30420
Final test accuracy: 0.29490
Accuracy on 100 samples: 0.28
因为有很多层,所以网络将要迭代很多次进行学习。当训练结束时,网络的准确率应该不太高。 那么接下来我们添加归一化操作,看看结果如何。
归一化
接下来的代码跟之前很相似,只是我们通过 tf.layers.batch_normalization
添加了归一化。
def fully_connected(prev_layer, num_units, is_training):
"""
Create a fully connectd layer with the given layer as input and the given number of neurons.
:param prev_layer: Tensor
The Tensor that acts as input into this layer
:param num_units: int
The size of the layer. That is, the number of units, nodes, or neurons.
:returns Tensor
A new fully connected layer
"""
layer = tf.layers.dense(prev_layer, num_units, use_bias=False, activation=None)
layer = tf.layers.batch_normalization(layer, training=is_training) #归一化
layer = tf.nn.relu(layer)
return layer
def conv_layer(prev_layer, layer_depth, is_training):
"""
Create a convolutional layer with the given layer as input.
:param prev_layer: Tensor
The Tensor that acts as input into this layer
:param layer_depth: int
We'll set the strides and number of feature maps based on the layer's depth in the network.
This is *not* a good way to make a CNN, but it helps us create this example with very little code.
:returns Tensor
A new convolutional layer
"""
strides = 2 if layer_depth % 3 == 0 else 1
conv_layer = tf.layers.conv2d(prev_layer, layer_depth*4, 3, strides, 'same', use_bias=False, activation=None)
conv_layer = tf.layers.batch_normalization(conv_layer, training=is_training)#归一化
conv_layer = tf.nn.relu(conv_layer)
return conv_layer
def train(num_batches, batch_size, learning_rate):
# Build placeholders for the input samples and labels
inputs = tf.placeholder(tf.float32, [None, 28, 28, 1])
labels = tf.placeholder(tf.float32, [None, 10])
is_training = tf.placeholder(tf.bool)
# Feed the inputs into a series of 20 convolutional layers
layer = inputs
for layer_i in range(1, 20):
layer = conv_layer(layer, layer_i, is_training)
# Flatten the output from the convolutional layers
orig_shape = layer.get_shape().as_list()
layer = tf.reshape(layer, shape=[-1, orig_shape[1] * orig_shape[2] * orig_shape[3]])
# Add one fully connected layer
layer = fully_connected(layer, 100,is_training)
# Create the output layer with 1 node for each
logits = tf.layers.dense(layer, 10)
# Define loss and training operations
model_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=labels))
# Tell TensorFlow to update the population statistics while training
with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)):
train_opt = tf.train.AdamOptimizer(learning_rate).minimize(model_loss)
# Create operations to test accuracy
correct_prediction = tf.equal(tf.argmax(logits,1), tf.argmax(labels,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# Train and test the network
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for batch_i in range(num_batches):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
# train this batch
sess.run(train_opt, {inputs: batch_xs, labels: batch_ys, is_training: True})
# Periodically check the validation or training loss and accuracy
if batch_i % 100 == 0:
loss, acc = sess.run([model_loss, accuracy], {inputs: mnist.validation.images,
labels: mnist.validation.labels,
is_training: False})
print('Batch: {:>2}: Validation loss: {:>3.5f}, Validation accuracy: {:>3.5f}'.format(batch_i, loss, acc))
elif batch_i % 25 == 0:
loss, acc = sess.run([model_loss, accuracy], {inputs: batch_xs, labels: batch_ys, is_training: False})
print('Batch: {:>2}: Training loss: {:>3.5f}, Training accuracy: {:>3.5f}'.format(batch_i, loss, acc))
# At the end, score the final accuracy for both the validation and test sets
acc = sess.run(accuracy, {inputs: mnist.validation.images,
labels: mnist.validation.labels,
is_training: False})
print('Final validation accuracy: {:>3.5f}'.format(acc))
acc = sess.run(accuracy, {inputs: mnist.test.images,
labels: mnist.test.labels,
is_training: False})
print('Final test accuracy: {:>3.5f}'.format(acc))
# Score the first 100 test images individually. This won't work if batch normalization isn't implemented correctly.
correct = 0
for i in range(100):
correct += sess.run(accuracy,feed_dict={inputs: [mnist.test.images[i]],
labels: [mnist.test.labels[i]],
is_training: False})
print("Accuracy on 100 samples:", correct/100)
num_batches = 800
batch_size = 64
learning_rate = 0.002
tf.reset_default_graph()
with tf.Graph().as_default():
train(num_batches, batch_size, learning_rate)
Batch: 0: Validation loss: 0.69050, Validation accuracy: 0.09900
Batch: 25: Training loss: 0.56782, Training accuracy: 0.04688
Batch: 50: Training loss: 0.45753, Training accuracy: 0.09375
Batch: 75: Training loss: 0.38981, Training accuracy: 0.10938
Batch: 100: Validation loss: 0.35713, Validation accuracy: 0.09900
Batch: 125: Training loss: 0.34297, Training accuracy: 0.15625
Batch: 150: Training loss: 0.33817, Training accuracy: 0.01562
Batch: 175: Training loss: 0.33227, Training accuracy: 0.06250
Batch: 200: Validation loss: 0.33167, Validation accuracy: 0.26160
Batch: 225: Training loss: 0.38631, Training accuracy: 0.18750
Batch: 250: Training loss: 0.33774, Training accuracy: 0.34375
Batch: 275: Training loss: 0.20149, Training accuracy: 0.59375
Batch: 300: Validation loss: 0.17220, Validation accuracy: 0.63080
Batch: 325: Training loss: 0.07430, Training accuracy: 0.89062
Batch: 350: Training loss: 0.09011, Training accuracy: 0.87500
Batch: 375: Training loss: 0.03864, Training accuracy: 0.89062
Batch: 400: Validation loss: 0.06403, Validation accuracy: 0.90200
Batch: 425: Training loss: 0.06060, Training accuracy: 0.92188
Batch: 450: Training loss: 0.03130, Training accuracy: 0.93750
Batch: 475: Training loss: 0.02583, Training accuracy: 0.96875
Batch: 500: Validation loss: 0.02855, Validation accuracy: 0.96100
Batch: 525: Training loss: 0.16131, Training accuracy: 0.76562
Batch: 550: Training loss: 0.00370, Training accuracy: 1.00000
Batch: 575: Training loss: 0.04717, Training accuracy: 0.92188
Batch: 600: Validation loss: 0.05843, Validation accuracy: 0.91880
Batch: 625: Training loss: 0.02037, Training accuracy: 0.95312
Batch: 650: Training loss: 0.04496, Training accuracy: 0.92188
Batch: 675: Training loss: 0.09602, Training accuracy: 0.85938
Batch: 700: Validation loss: 0.05082, Validation accuracy: 0.93260
Batch: 725: Training loss: 0.03261, Training accuracy: 0.95312
Batch: 750: Training loss: 0.00247, Training accuracy: 1.00000
Batch: 775: Training loss: 0.04725, Training accuracy: 0.93750
Final validation accuracy: 0.87940
Final test accuracy: 0.88330
Accuracy on 100 samples: 0.85
使用了批量归一化后,准确率大大提高了。注意我们使用了is_training
参数来判断是否处于训练阶段,还是测试阶段。
使用tf.nn.batch_normalization
实现归一化
很多时候,我们都会直接使用进过封装后的包进行批量归一化操作,但是有时我们想要实现在TensorFlow
在没有实现的操作,比如在LSTM
中实现归一化,这时,我们就得自己实现批量归一化的细节了。
我们使用的函数是tf.nn.batch_normalization
.
def fully_connected(prev_layer, num_units, is_training):
"""
Create a fully connectd layer with the given layer as input and the given number of neurons.
:param prev_layer: Tensor
The Tensor that acts as input into this layer
:param num_units: int
The size of the layer. That is, the number of units, nodes, or neurons.
:returns Tensor
A new fully connected layer
"""
layer = tf.layers.dense(prev_layer, num_units, use_bias=False, activation=None)
gamma = tf.Variable(tf.ones([num_units]))#归一化参数,会进行训练
beta = tf.Variable(tf.zeros([num_units]))#归一化参数,会进行训练
pop_mean = tf.Variable(tf.zeros([num_units]), trainable=False)
pop_variance = tf.Variable(tf.ones([num_units]), trainable=False)
epsilon = 1e-3
#训练阶段的函数
def batch_normal_training():
batch_mean, batch_variance = tf.nn.moments(layer, [0])
decay = 0.99
train_mean = tf.assign(pop_mean, pop_mean * decay + batch_mean * (1 - decay))#计算均值,测试阶段使用
train_variance = tf.assign(pop_variance, pop_variance * decay + batch_variance * (1 - decay)) #计算偏差,测试阶段使用
with tf.control_dependencies([train_mean, train_variance]):
return tf.nn.batch_normalization(layer, batch_mean, batch_variance, beta, gamma, epsilon)
#测试阶段使用的函数
def batch_normal_inference():
return tf.nn.batch_normalization(layer, pop_mean, pop_variance, beta, gamma, epsilon)
batch_normalized_output = tf.cond(is_training, batch_normal_training, batch_normal_inference)
return tf.nn.relu(batch_normalized_output)
def conv_layer(prev_layer, layer_depth, is_training):
"""
Create a convolutional layer with the given layer as input.
:param prev_layer: Tensor
The Tensor that acts as input into this layer
:param layer_depth: int
We'll set the strides and number of feature maps based on the layer's depth in the network.
This is *not* a good way to make a CNN, but it helps us create this example with very little code.
:returns Tensor
A new convolutional layer
"""
strides = 2 if layer_depth % 3 == 0 else 1
in_channels = prev_layer.get_shape().as_list()[3]
out_channels = layer_depth*4
weights = tf.Variable(
tf.truncated_normal([3, 3, in_channels, out_channels], stddev=0.05))
bias = tf.Variable(tf.zeros(out_channels))
layer = tf.nn.conv2d(prev_layer, weights, strides=[1,strides, strides, 1], padding='SAME')
gamma = tf.Variable(tf.ones([out_channels]))
beta = tf.Variable(tf.zeros([out_channels]))
pop_mean = tf.Variable(tf.zeros([out_channels]), trainable=False)
pop_variance = tf.Variable(tf.ones([out_channels]), trainable=False)
epsilon = 1e-3
def batch_norm_training():
# Important to use the correct dimensions here to ensure the mean and variance are calculated
# per feature map instead of for the entire layer
batch_mean, batch_variance = tf.nn.moments(layer, [0,1,2], keep_dims=False)
decay = 0.99
train_mean = tf.assign(pop_mean, pop_mean * decay + batch_mean * (1 - decay))
train_variance = tf.assign(pop_variance, pop_variance * decay + batch_variance * (1 - decay))
with tf.control_dependencies([train_mean, train_variance]):
return tf.nn.batch_normalization(layer, batch_mean, batch_variance, beta, gamma, epsilon)
def batch_norm_inference():
return tf.nn.batch_normalization(layer, pop_mean, pop_variance, beta, gamma, epsilon)
batch_normalized_output = tf.cond(is_training, batch_norm_training, batch_norm_inference)
return tf.nn.relu(batch_normalized_output)
return conv_layer
def train(num_batches, batch_size, learning_rate):
# Build placeholders for the input samples and labels
inputs = tf.placeholder(tf.float32, [None, 28, 28, 1])
labels = tf.placeholder(tf.float32, [None, 10])
is_training = tf.placeholder(tf.bool)
# Feed the inputs into a series of 20 convolutional layers
layer = inputs
for layer_i in range(1, 20):
layer = conv_layer(layer, layer_i, is_training)
# Flatten the output from the convolutional layers
orig_shape = layer.get_shape().as_list()
layer = tf.reshape(layer, shape=[-1, orig_shape[1] * orig_shape[2] * orig_shape[3]])
# Add one fully connected layer
layer = fully_connected(layer, 100, is_training)
# Create the output layer with 1 node for each
logits = tf.layers.dense(layer, 10)
# Define loss and training operations
model_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=labels))
train_opt = tf.train.AdamOptimizer(learning_rate).minimize(model_loss)
# Create operations to test accuracy
correct_prediction = tf.equal(tf.argmax(logits,1), tf.argmax(labels,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# Train and test the network
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for batch_i in range(num_batches):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
# train this batch
sess.run(train_opt, {inputs: batch_xs, labels: batch_ys, is_training: True})
# Periodically check the validation or training loss and accuracy
if batch_i % 100 == 0:
loss, acc = sess.run([model_loss, accuracy], {inputs: mnist.validation.images,
labels: mnist.validation.labels,
is_training: False})
print('Batch: {:>2}: Validation loss: {:>3.5f}, Validation accuracy: {:>3.5f}'.format(batch_i, loss, acc))
elif batch_i % 25 == 0:
loss, acc = sess.run([model_loss, accuracy], {inputs: batch_xs, labels: batch_ys, is_training: False})
print('Batch: {:>2}: Training loss: {:>3.5f}, Training accuracy: {:>3.5f}'.format(batch_i, loss, acc))
# At the end, score the final accuracy for both the validation and test sets
acc = sess.run(accuracy, {inputs: mnist.validation.images,
labels: mnist.validation.labels,
is_training: False})
print('Final validation accuracy: {:>3.5f}'.format(acc))
acc = sess.run(accuracy, {inputs: mnist.test.images,
labels: mnist.test.labels,
is_training: False})
print('Final test accuracy: {:>3.5f}'.format(acc))
# Score the first 100 test images individually. This won't work if batch normalization isn't implemented correctly.
correct = 0
for i in range(100):
correct += sess.run(accuracy,feed_dict={inputs: [mnist.test.images[i]],
labels: [mnist.test.labels[i]],
is_training: False})
print("Accuracy on 100 samples:", correct/100)
num_batches = 800
batch_size = 64
learning_rate = 0.002
tf.reset_default_graph()
with tf.Graph().as_default():
train(num_batches, batch_size, learning_rate)
Batch: 0: Validation loss: 0.69103, Validation accuracy: 0.09760
Batch: 25: Training loss: 0.57469, Training accuracy: 0.07812
Batch: 50: Training loss: 0.45555, Training accuracy: 0.09375
Batch: 75: Training loss: 0.39003, Training accuracy: 0.10938
Batch: 100: Validation loss: 0.35794, Validation accuracy: 0.09240
Batch: 125: Training loss: 0.34876, Training accuracy: 0.09375
Batch: 150: Training loss: 0.33988, Training accuracy: 0.14062
Batch: 175: Training loss: 0.34756, Training accuracy: 0.17188
Batch: 200: Validation loss: 0.33286, Validation accuracy: 0.13800
Batch: 225: Training loss: 0.33089, Training accuracy: 0.20312
Batch: 250: Training loss: 0.30912, Training accuracy: 0.25000
Batch: 275: Training loss: 0.40437, Training accuracy: 0.26562
Batch: 300: Validation loss: 0.29348, Validation accuracy: 0.43260
Batch: 325: Training loss: 0.30869, Training accuracy: 0.45312
Batch: 350: Training loss: 0.11182, Training accuracy: 0.78125
Batch: 375: Training loss: 0.12043, Training accuracy: 0.81250
Batch: 400: Validation loss: 0.10275, Validation accuracy: 0.84940
Batch: 425: Training loss: 0.00938, Training accuracy: 1.00000
Batch: 450: Training loss: 0.03079, Training accuracy: 0.93750
Batch: 475: Training loss: 0.03406, Training accuracy: 0.96875
Batch: 500: Validation loss: 0.07192, Validation accuracy: 0.89240
Batch: 525: Training loss: 0.07040, Training accuracy: 0.87500
Batch: 550: Training loss: 0.00455, Training accuracy: 1.00000
Batch: 575: Training loss: 0.01988, Training accuracy: 0.96875
训练结果比没有使用批量归一化操作提高了很多,在这个版本中我们实现了归一化的内部细节。