Batch Normalization

Batch normalization is a technique used in deep learning to improve the training of neural networks. It was introduced in a paper by Sergey Ioffe and Christian Szegedy in 2015. The authors observed that the distribution of inputs to each layer of a neural network changes during training, which can make it difficult for the network to learn. Batch normalization aims to address this problem by normalizing the inputs to each layer.

How Batch Normalization Works

Batch normalization works by normalizing the inputs to each layer of a neural network, so that they have zero mean and unit variance. This is done by subtracting the mean of the batch and dividing by the standard deviation of the batch:

$\hat{x} = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}}$

where $\hat{x}$ is the normalized input, $x$ is the input, $\mathrm{E}[x]$ is the mean of the batch, $\mathrm{Var}[x]$ is the variance of the batch, and $\epsilon$ is a small constant to prevent division by zero.

Batch normalization can be applied to the inputs to each layer, or to the outputs of each layer. In the latter case, the normalized inputs are scaled and shifted by learnable parameters $\gamma$ and $\beta$ :

$y = \gamma \hat{x} + \beta$

where $y$ is the output of the layer.

Benefits of Batch Normalization

Batch normalization has several benefits for training deep neural networks:

Faster convergence: By normalizing the inputs to each layer, batch normalization helps to avoid the problem of vanishing or exploding gradients, which can slow down or prevent convergence.
Improved generalization: Batch normalization can act as a form of regularization, by reducing the dependence of the network on the specific values of the weights and biases.
Increased robustness: Batch normalization can make the network less sensitive to changes in the input distribution, which can help to improve performance on out-of-distribution examples.

Drawbacks of Batch Normalization

Although batch normalization is a powerful technique for improving the training of deep neural networks, it does have some drawbacks:

Increased computational cost: Batch normalization requires additional computations to calculate the mean and variance of each batch, and to normalize the inputs. This can increase the computational cost of training, especially for large models.
Dependency on batch size: Batch normalization is designed to work with mini-batches of data, and its effectiveness can depend on the size of the batch. In practice, batch sizes are often chosen based on computational constraints, which can limit the effectiveness of batch normalization.

Conclusion

Batch normalization is a powerful technique for improving the training of deep neural networks. By normalizing the inputs to each layer, it can help to avoid the problem of vanishing or exploding gradients, and can act as a form of regularization. However, it does have some drawbacks, including increased computational cost and dependency on batch size. Overall, batch normalization is a valuable tool for anyone working with deep learning, and is widely used in practice.

Contents

How Batch Normalization Works

Benefits of Batch Normalization

Drawbacks of Batch Normalization

Conclusion

Kickstarters

Deep Dive

Batch Normalization

How Batch Normalization Works

Benefits of Batch Normalization

Drawbacks of Batch Normalization

Conclusion

Contents