Weight Initialization Techniques in Neural Networks

Amal
2 min readMar 6, 2022

--

Keypoints that should be noted

  • Weight should be small
  • Weights should not be same
  • Weight should have variance
Basic neural network

Weights should not be very small? Why? If it is very small it causes vanishing gradient problem.

Vanishing gradient and Exploding gradient problem mainly occurs because of wrong weight initialization and activation function.

Let’s now discuss three main types of weight initialization technique used :

1) Xavier/Glorot Initializer

Glorot Normal :

  • Also called as Xavier Normal initializer.
input and output weights
  • Weights are being selected from a normal distribution with mean as 0 and standard deviation as above figure.

Glorot Uniform :

  • Also called as Xavier Uniform initializer.
  • Tensorflow uses glorot_uniform by default
  • Weights are being selected from uniform distribution with limits. [ -x , +x]

2) He Initializer

He Normal :

Weights are being selected from a normal distribution with mean as 0 and standard deviation as sqrt(2/inputs)

He Uniform :

Weights are being selected from uniform distribution with limits. [ a, b], where f_in and f_out are inputs and output weight.

3) Lecun Initializer:

Lecun Normal :

Weights are being selected from a normal distribution with mean as 0 and standard deviation as sqrt(1/inputs)

Lecun Uniform :

Weights are being selected from a uniform distribution with limits [-limit, limit]

Wij ~ Uniform Distribution [-limit, limit], where limit = sqrt(3 / fan_in)

Standard Approach :

  1. Xavier/Glorot Initialization works well with hyperbolic Tan (tanh), Logistic(sigmoid)
  2. He Initialization works well with Rectified Linear activation unit(ReLU) and its Variants.
  3. LeCun Initialization works well with Scaled Exponential Linear Unit(SELU), hyperbolic Tan (tanh)

Hope you learned something new today, Happy Learning!

--

--