批量归一化实践

06 Apr 2018 | deep-learning |

批量归一化 – 实践

批量归一化在深度神经网络中很有用。为了证明这一点，我们将会创建一个20层卷积层和一个全连接层的卷积神经网络。我们所使用的数据集是 MNIST数据集。需要注意的是，这对MNIST数据集并不是一个好的分类神经网络，你可以直接设计一个更简单更好的网络。我们为了显示出批量归一化的好处，而做出了一些特定的处理：

网络足够复杂，使用批量归一化将会有更明显的好处。
网络训练简单，易于使用批量归一化。
网络架构简单易于实现。

我们会使用两个版本的批量归一化，一个进行了更多的封装操作，一个只是具有简单的功能，你可以在之前的文章中看他们的区别。

Batch Normalization with tf.layers.batch_normalization
Batch Normalization with tf.nn.batch_normalization

首先我们需要去下载数据集。

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True, reshape=False)

Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz

使用 `tf.layers.batch_normalization`实现归一化

tf.layers版本的使用更加常见，tf.layers.batch_normalization

我们将会使用下面的函数来实现全链接层，现在还没有使用归一化操作。

"""
DO NOT MODIFY THIS CELL
"""
def fully_connected(prev_layer, num_units):
    """
    Create a fully connectd layer with the given layer as input and the given number of neurons.
    
    :param prev_layer: Tensor
        The Tensor that acts as input into this layer
    :param num_units: int
        The size of the layer. That is, the number of units, nodes, or neurons.
    :returns Tensor
        A new fully connected layer
    """
    layer = tf.layers.dense(prev_layer, num_units, activation=tf.nn.relu)
    return layer

我们将会使用下面的函数实现卷积层，现在还没有使用归一化操作。

"""
DO NOT MODIFY THIS CELL
"""
def conv_layer(prev_layer, layer_depth):
    """
    Create a convolutional layer with the given layer as input.
    
    :param prev_layer: Tensor
        The Tensor that acts as input into this layer
    :param layer_depth: int
        We'll set the strides and number of feature maps based on the layer's depth in the network.
        This is *not* a good way to make a CNN, but it helps us create this example with very little code.
    :returns Tensor
        A new convolutional layer
    """
    strides = 2 if layer_depth % 3 == 0 else 1
    conv_layer = tf.layers.conv2d(prev_layer, layer_depth*4, 3, strides, 'same', activation=tf.nn.relu)
    return conv_layer

下面的函数将会构建一个网络并进行训练，此时还没有进行归一化。

"""
DO NOT MODIFY THIS CELL
"""
def train(num_batches, batch_size, learning_rate):
    # Build placeholders for the input samples and labels 
    inputs = tf.placeholder(tf.float32, [None, 28, 28, 1])
    labels = tf.placeholder(tf.float32, [None, 10])
    
    # Feed the inputs into a series of 20 convolutional layers 
    layer = inputs
    for layer_i in range(1, 20):
        layer = conv_layer(layer, layer_i)

    # Flatten the output from the convolutional layers 
    orig_shape = layer.get_shape().as_list()
    layer = tf.reshape(layer, shape=[-1, orig_shape[1] * orig_shape[2] * orig_shape[3]])

    # Add one fully connected layer
    layer = fully_connected(layer, 100)

    # Create the output layer with 1 node for each 
    logits = tf.layers.dense(layer, 10)
    
    # Define loss and training operations
    model_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=labels))
    train_opt = tf.train.AdamOptimizer(learning_rate).minimize(model_loss)
    
    # Create operations to test accuracy
    correct_prediction = tf.equal(tf.argmax(logits,1), tf.argmax(labels,1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    
    # Train and test the network
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        for batch_i in range(num_batches):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)

            # train this batch
            sess.run(train_opt, {inputs: batch_xs, labels: batch_ys})
            
            # Periodically check the validation or training loss and accuracy
            if batch_i % 100 == 0:
                loss, acc = sess.run([model_loss, accuracy], {inputs: mnist.validation.images,
                                                              labels: mnist.validation.labels})
                print('Batch: {:>2}: Validation loss: {:>3.5f}, Validation accuracy: {:>3.5f}'.format(batch_i, loss, acc))
            elif batch_i % 25 == 0:
                loss, acc = sess.run([model_loss, accuracy], {inputs: batch_xs, labels: batch_ys})
                print('Batch: {:>2}: Training loss: {:>3.5f}, Training accuracy: {:>3.5f}'.format(batch_i, loss, acc))

        # At the end, score the final accuracy for both the validation and test sets
        acc = sess.run(accuracy, {inputs: mnist.validation.images,
                                  labels: mnist.validation.labels})
        print('Final validation accuracy: {:>3.5f}'.format(acc))
        acc = sess.run(accuracy, {inputs: mnist.test.images,
                                  labels: mnist.test.labels})
        print('Final test accuracy: {:>3.5f}'.format(acc))
        
        # Score the first 100 test images individually. This won't work if batch normalization isn't implemented correctly.
        correct = 0
        for i in range(100):
            correct += sess.run(accuracy,feed_dict={inputs: [mnist.test.images[i]],
                                                    labels: [mnist.test.labels[i]]})

        print("Accuracy on 100 samples:", correct/100)


num_batches = 800
batch_size = 64
learning_rate = 0.002

tf.reset_default_graph()
with tf.Graph().as_default():
    train(num_batches, batch_size, learning_rate)

Batch:  0: Validation loss: 0.69062, Validation accuracy: 0.09860
Batch: 25: Training loss: 0.35044, Training accuracy: 0.07812
Batch: 50: Training loss: 0.32532, Training accuracy: 0.10938
Batch: 75: Training loss: 0.32628, Training accuracy: 0.06250
Batch: 100: Validation loss: 0.32518, Validation accuracy: 0.10700
Batch: 125: Training loss: 0.32709, Training accuracy: 0.03125
Batch: 150: Training loss: 0.32582, Training accuracy: 0.09375
Batch: 175: Training loss: 0.32305, Training accuracy: 0.21875
Batch: 200: Validation loss: 0.32584, Validation accuracy: 0.09860
Batch: 225: Training loss: 0.32501, Training accuracy: 0.15625
Batch: 250: Training loss: 0.32705, Training accuracy: 0.07812
Batch: 275: Training loss: 0.32302, Training accuracy: 0.12500
Batch: 300: Validation loss: 0.32591, Validation accuracy: 0.08680
Batch: 325: Training loss: 0.32264, Training accuracy: 0.15625
Batch: 350: Training loss: 0.32369, Training accuracy: 0.12500
Batch: 375: Training loss: 0.32658, Training accuracy: 0.09375
Batch: 400: Validation loss: 0.32506, Validation accuracy: 0.11260
Batch: 425: Training loss: 0.32533, Training accuracy: 0.09375
Batch: 450: Training loss: 0.32705, Training accuracy: 0.10938
Batch: 475: Training loss: 0.32655, Training accuracy: 0.04688
Batch: 500: Validation loss: 0.32510, Validation accuracy: 0.10020
Batch: 525: Training loss: 0.33035, Training accuracy: 0.01562
Batch: 550: Training loss: 0.32439, Training accuracy: 0.09375
Batch: 575: Training loss: 0.32113, Training accuracy: 0.20312
Batch: 600: Validation loss: 0.32499, Validation accuracy: 0.11260
Batch: 625: Training loss: 0.32403, Training accuracy: 0.14062
Batch: 650: Training loss: 0.33020, Training accuracy: 0.03125
Batch: 675: Training loss: 0.31972, Training accuracy: 0.12500
Batch: 700: Validation loss: 0.29751, Validation accuracy: 0.20460
Batch: 725: Training loss: 0.34273, Training accuracy: 0.06250
Batch: 750: Training loss: 0.31551, Training accuracy: 0.28125
Batch: 775: Training loss: 0.28567, Training accuracy: 0.25000
Final validation accuracy: 0.30420
Final test accuracy: 0.29490
Accuracy on 100 samples: 0.28

因为有很多层，所以网络将要迭代很多次进行学习。当训练结束时，网络的准确率应该不太高。那么接下来我们添加归一化操作，看看结果如何。

归一化

接下来的代码跟之前很相似，只是我们通过 tf.layers.batch_normalization 添加了归一化。

def fully_connected(prev_layer, num_units, is_training):
    """
    Create a fully connectd layer with the given layer as input and the given number of neurons.
    
    :param prev_layer: Tensor
        The Tensor that acts as input into this layer
    :param num_units: int
        The size of the layer. That is, the number of units, nodes, or neurons.
    :returns Tensor
        A new fully connected layer
    """
    layer = tf.layers.dense(prev_layer, num_units, use_bias=False, activation=None)
    layer = tf.layers.batch_normalization(layer, training=is_training) #归一化
    layer = tf.nn.relu(layer)
    return layer

def conv_layer(prev_layer, layer_depth, is_training):
    """
    Create a convolutional layer with the given layer as input.
    
    :param prev_layer: Tensor
        The Tensor that acts as input into this layer
    :param layer_depth: int
        We'll set the strides and number of feature maps based on the layer's depth in the network.
        This is *not* a good way to make a CNN, but it helps us create this example with very little code.
    :returns Tensor
        A new convolutional layer
    """
    strides = 2 if layer_depth % 3 == 0 else 1
    conv_layer = tf.layers.conv2d(prev_layer, layer_depth*4, 3, strides, 'same', use_bias=False, activation=None)
    conv_layer = tf.layers.batch_normalization(conv_layer, training=is_training)#归一化
    conv_layer = tf.nn.relu(conv_layer)
    return conv_layer

def train(num_batches, batch_size, learning_rate):
    # Build placeholders for the input samples and labels 
    inputs = tf.placeholder(tf.float32, [None, 28, 28, 1])
    labels = tf.placeholder(tf.float32, [None, 10])
    
    is_training = tf.placeholder(tf.bool)
    # Feed the inputs into a series of 20 convolutional layers 
    layer = inputs
    for layer_i in range(1, 20):
        layer = conv_layer(layer, layer_i, is_training)

    # Flatten the output from the convolutional layers 
    orig_shape = layer.get_shape().as_list()
    layer = tf.reshape(layer, shape=[-1, orig_shape[1] * orig_shape[2] * orig_shape[3]])

    # Add one fully connected layer
    layer = fully_connected(layer, 100,is_training)

    # Create the output layer with 1 node for each 
    logits = tf.layers.dense(layer, 10)
    
    # Define loss and training operations
    model_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=labels))
    
    # Tell TensorFlow to update the population statistics while training
    with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)):
        train_opt = tf.train.AdamOptimizer(learning_rate).minimize(model_loss)
 
    
    # Create operations to test accuracy
    correct_prediction = tf.equal(tf.argmax(logits,1), tf.argmax(labels,1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    
    # Train and test the network
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        for batch_i in range(num_batches):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)

            # train this batch
            sess.run(train_opt, {inputs: batch_xs, labels: batch_ys, is_training: True})
            
            # Periodically check the validation or training loss and accuracy
            if batch_i % 100 == 0:
                loss, acc = sess.run([model_loss, accuracy], {inputs: mnist.validation.images,
                                                              labels: mnist.validation.labels,
                                                              is_training: False})
                print('Batch: {:>2}: Validation loss: {:>3.5f}, Validation accuracy: {:>3.5f}'.format(batch_i, loss, acc))
            elif batch_i % 25 == 0:
                loss, acc = sess.run([model_loss, accuracy], {inputs: batch_xs, labels: batch_ys, is_training: False})
                print('Batch: {:>2}: Training loss: {:>3.5f}, Training accuracy: {:>3.5f}'.format(batch_i, loss, acc))

        # At the end, score the final accuracy for both the validation and test sets
        acc = sess.run(accuracy, {inputs: mnist.validation.images,
                                  labels: mnist.validation.labels,
                                  is_training: False})
        print('Final validation accuracy: {:>3.5f}'.format(acc))
        acc = sess.run(accuracy, {inputs: mnist.test.images,
                                  labels: mnist.test.labels,
                                  is_training: False})
        print('Final test accuracy: {:>3.5f}'.format(acc))
        
        # Score the first 100 test images individually. This won't work if batch normalization isn't implemented correctly.
        correct = 0
        for i in range(100):
            correct += sess.run(accuracy,feed_dict={inputs: [mnist.test.images[i]],
                                                    labels: [mnist.test.labels[i]],
                                                    is_training: False})

        print("Accuracy on 100 samples:", correct/100)


num_batches = 800
batch_size = 64
learning_rate = 0.002

tf.reset_default_graph()
with tf.Graph().as_default():
    train(num_batches, batch_size, learning_rate)

Batch:  0: Validation loss: 0.69050, Validation accuracy: 0.09900
Batch: 25: Training loss: 0.56782, Training accuracy: 0.04688
Batch: 50: Training loss: 0.45753, Training accuracy: 0.09375
Batch: 75: Training loss: 0.38981, Training accuracy: 0.10938
Batch: 100: Validation loss: 0.35713, Validation accuracy: 0.09900
Batch: 125: Training loss: 0.34297, Training accuracy: 0.15625
Batch: 150: Training loss: 0.33817, Training accuracy: 0.01562
Batch: 175: Training loss: 0.33227, Training accuracy: 0.06250
Batch: 200: Validation loss: 0.33167, Validation accuracy: 0.26160
Batch: 225: Training loss: 0.38631, Training accuracy: 0.18750
Batch: 250: Training loss: 0.33774, Training accuracy: 0.34375
Batch: 275: Training loss: 0.20149, Training accuracy: 0.59375
Batch: 300: Validation loss: 0.17220, Validation accuracy: 0.63080
Batch: 325: Training loss: 0.07430, Training accuracy: 0.89062
Batch: 350: Training loss: 0.09011, Training accuracy: 0.87500
Batch: 375: Training loss: 0.03864, Training accuracy: 0.89062
Batch: 400: Validation loss: 0.06403, Validation accuracy: 0.90200
Batch: 425: Training loss: 0.06060, Training accuracy: 0.92188
Batch: 450: Training loss: 0.03130, Training accuracy: 0.93750
Batch: 475: Training loss: 0.02583, Training accuracy: 0.96875
Batch: 500: Validation loss: 0.02855, Validation accuracy: 0.96100
Batch: 525: Training loss: 0.16131, Training accuracy: 0.76562
Batch: 550: Training loss: 0.00370, Training accuracy: 1.00000
Batch: 575: Training loss: 0.04717, Training accuracy: 0.92188
Batch: 600: Validation loss: 0.05843, Validation accuracy: 0.91880
Batch: 625: Training loss: 0.02037, Training accuracy: 0.95312
Batch: 650: Training loss: 0.04496, Training accuracy: 0.92188
Batch: 675: Training loss: 0.09602, Training accuracy: 0.85938
Batch: 700: Validation loss: 0.05082, Validation accuracy: 0.93260
Batch: 725: Training loss: 0.03261, Training accuracy: 0.95312
Batch: 750: Training loss: 0.00247, Training accuracy: 1.00000
Batch: 775: Training loss: 0.04725, Training accuracy: 0.93750
Final validation accuracy: 0.87940
Final test accuracy: 0.88330
Accuracy on 100 samples: 0.85

使用了批量归一化后，准确率大大提高了。注意我们使用了is_training参数来判断是否处于训练阶段，还是测试阶段。

使用`tf.nn.batch_normalization`实现归一化

很多时候，我们都会直接使用进过封装后的包进行批量归一化操作，但是有时我们想要实现在TensorFlow在没有实现的操作，比如在LSTM中实现归一化，这时，我们就得自己实现批量归一化的细节了。我们使用的函数是tf.nn.batch_normalization.

def fully_connected(prev_layer, num_units, is_training):
    """
    Create a fully connectd layer with the given layer as input and the given number of neurons.
    
    :param prev_layer: Tensor
        The Tensor that acts as input into this layer
    :param num_units: int
        The size of the layer. That is, the number of units, nodes, or neurons.
    :returns Tensor
        A new fully connected layer
    """
    layer = tf.layers.dense(prev_layer, num_units, use_bias=False, activation=None)
    
    gamma = tf.Variable(tf.ones([num_units]))#归一化参数，会进行训练
    beta = tf.Variable(tf.zeros([num_units]))#归一化参数，会进行训练
                        
    pop_mean = tf.Variable(tf.zeros([num_units]), trainable=False)                    
    pop_variance = tf.Variable(tf.ones([num_units]), trainable=False)                    
    epsilon = 1e-3
                        
    #训练阶段的函数
    def batch_normal_training():
        batch_mean, batch_variance = tf.nn.moments(layer, [0])    
          
        decay = 0.99
        train_mean = tf.assign(pop_mean, pop_mean * decay + batch_mean * (1 - decay))#计算均值，测试阶段使用         
        train_variance = tf.assign(pop_variance, pop_variance * decay + batch_variance * (1 - decay))  #计算偏差，测试阶段使用       
          
        with tf.control_dependencies([train_mean, train_variance]):
            return tf.nn.batch_normalization(layer, batch_mean, batch_variance, beta, gamma, epsilon)
     
    #测试阶段使用的函数
    def batch_normal_inference():
        return tf.nn.batch_normalization(layer, pop_mean, pop_variance, beta, gamma,  epsilon)
    
    batch_normalized_output = tf.cond(is_training, batch_normal_training, batch_normal_inference)   
                        
    return tf.nn.relu(batch_normalized_output)

def conv_layer(prev_layer, layer_depth, is_training):
    """
    Create a convolutional layer with the given layer as input.
    
    :param prev_layer: Tensor
        The Tensor that acts as input into this layer
    :param layer_depth: int
        We'll set the strides and number of feature maps based on the layer's depth in the network.
        This is *not* a good way to make a CNN, but it helps us create this example with very little code.
    :returns Tensor
        A new convolutional layer
    """
    strides = 2 if layer_depth % 3 == 0 else 1

    in_channels = prev_layer.get_shape().as_list()[3]
    out_channels = layer_depth*4
    
    weights = tf.Variable(
        tf.truncated_normal([3, 3, in_channels, out_channels], stddev=0.05))
    
    bias = tf.Variable(tf.zeros(out_channels))

    layer = tf.nn.conv2d(prev_layer, weights, strides=[1,strides, strides, 1], padding='SAME')
    
    gamma = tf.Variable(tf.ones([out_channels]))
    beta = tf.Variable(tf.zeros([out_channels]))

    pop_mean = tf.Variable(tf.zeros([out_channels]), trainable=False)
    pop_variance = tf.Variable(tf.ones([out_channels]), trainable=False)

    epsilon = 1e-3
    
    def batch_norm_training():
        # Important to use the correct dimensions here to ensure the mean and variance are calculated 
        # per feature map instead of for the entire layer
        batch_mean, batch_variance = tf.nn.moments(layer, [0,1,2], keep_dims=False)

        decay = 0.99
        train_mean = tf.assign(pop_mean, pop_mean * decay + batch_mean * (1 - decay))
        train_variance = tf.assign(pop_variance, pop_variance * decay + batch_variance * (1 - decay))

        with tf.control_dependencies([train_mean, train_variance]):
            return tf.nn.batch_normalization(layer, batch_mean, batch_variance, beta, gamma, epsilon)
 
    def batch_norm_inference():
        return tf.nn.batch_normalization(layer, pop_mean, pop_variance, beta, gamma, epsilon)

    batch_normalized_output = tf.cond(is_training, batch_norm_training, batch_norm_inference)
    return tf.nn.relu(batch_normalized_output)
    return conv_layer

def train(num_batches, batch_size, learning_rate):
    # Build placeholders for the input samples and labels 
    inputs = tf.placeholder(tf.float32, [None, 28, 28, 1])
    labels = tf.placeholder(tf.float32, [None, 10])
    is_training = tf.placeholder(tf.bool)
    # Feed the inputs into a series of 20 convolutional layers 
    layer = inputs
    for layer_i in range(1, 20):
        layer = conv_layer(layer, layer_i, is_training)

    # Flatten the output from the convolutional layers 
    orig_shape = layer.get_shape().as_list()
    layer = tf.reshape(layer, shape=[-1, orig_shape[1] * orig_shape[2] * orig_shape[3]])

    # Add one fully connected layer
    layer = fully_connected(layer, 100, is_training)

    # Create the output layer with 1 node for each 
    logits = tf.layers.dense(layer, 10)
    
    # Define loss and training operations
    model_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=labels))
    train_opt = tf.train.AdamOptimizer(learning_rate).minimize(model_loss)
    
    # Create operations to test accuracy
    correct_prediction = tf.equal(tf.argmax(logits,1), tf.argmax(labels,1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    
    # Train and test the network
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        for batch_i in range(num_batches):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)

            # train this batch
            sess.run(train_opt, {inputs: batch_xs, labels: batch_ys, is_training: True})
            
            # Periodically check the validation or training loss and accuracy
            if batch_i % 100 == 0:
                loss, acc = sess.run([model_loss, accuracy], {inputs: mnist.validation.images,
                                                              labels: mnist.validation.labels,
                                                              is_training: False})
                print('Batch: {:>2}: Validation loss: {:>3.5f}, Validation accuracy: {:>3.5f}'.format(batch_i, loss, acc))
            elif batch_i % 25 == 0:
                loss, acc = sess.run([model_loss, accuracy], {inputs: batch_xs, labels: batch_ys, is_training: False})
                print('Batch: {:>2}: Training loss: {:>3.5f}, Training accuracy: {:>3.5f}'.format(batch_i, loss, acc))

        # At the end, score the final accuracy for both the validation and test sets
        acc = sess.run(accuracy, {inputs: mnist.validation.images,
                                  labels: mnist.validation.labels,
                                  is_training: False})
        print('Final validation accuracy: {:>3.5f}'.format(acc))
        acc = sess.run(accuracy, {inputs: mnist.test.images,
                                  labels: mnist.test.labels,
                                  is_training: False})
        print('Final test accuracy: {:>3.5f}'.format(acc))
        
        # Score the first 100 test images individually. This won't work if batch normalization isn't implemented correctly.
        correct = 0
        for i in range(100):
            correct += sess.run(accuracy,feed_dict={inputs: [mnist.test.images[i]],
                                                    labels: [mnist.test.labels[i]],
                                                    is_training: False})

        print("Accuracy on 100 samples:", correct/100)


num_batches = 800
batch_size = 64
learning_rate = 0.002

tf.reset_default_graph()
with tf.Graph().as_default():
    train(num_batches, batch_size, learning_rate)

Batch:  0: Validation loss: 0.69103, Validation accuracy: 0.09760
Batch: 25: Training loss: 0.57469, Training accuracy: 0.07812
Batch: 50: Training loss: 0.45555, Training accuracy: 0.09375
Batch: 75: Training loss: 0.39003, Training accuracy: 0.10938
Batch: 100: Validation loss: 0.35794, Validation accuracy: 0.09240
Batch: 125: Training loss: 0.34876, Training accuracy: 0.09375
Batch: 150: Training loss: 0.33988, Training accuracy: 0.14062
Batch: 175: Training loss: 0.34756, Training accuracy: 0.17188
Batch: 200: Validation loss: 0.33286, Validation accuracy: 0.13800
Batch: 225: Training loss: 0.33089, Training accuracy: 0.20312
Batch: 250: Training loss: 0.30912, Training accuracy: 0.25000
Batch: 275: Training loss: 0.40437, Training accuracy: 0.26562
Batch: 300: Validation loss: 0.29348, Validation accuracy: 0.43260
Batch: 325: Training loss: 0.30869, Training accuracy: 0.45312
Batch: 350: Training loss: 0.11182, Training accuracy: 0.78125
Batch: 375: Training loss: 0.12043, Training accuracy: 0.81250
Batch: 400: Validation loss: 0.10275, Validation accuracy: 0.84940
Batch: 425: Training loss: 0.00938, Training accuracy: 1.00000
Batch: 450: Training loss: 0.03079, Training accuracy: 0.93750
Batch: 475: Training loss: 0.03406, Training accuracy: 0.96875
Batch: 500: Validation loss: 0.07192, Validation accuracy: 0.89240
Batch: 525: Training loss: 0.07040, Training accuracy: 0.87500
Batch: 550: Training loss: 0.00455, Training accuracy: 1.00000
Batch: 575: Training loss: 0.01988, Training accuracy: 0.96875

训练结果比没有使用批量归一化操作提高了很多，在这个版本中我们实现了归一化的内部细节。

Midnight