Weight Initialization Techniques in Neural Networks

2 min readMar 6, 2022

Keypoints that should be noted

Weights should not be very small? Why? If it is very small it causes vanishing gradient problem.

Vanishing gradient and Exploding gradient problem mainly occurs because of wrong weight initialization and activation function.

Let’s now discuss three main types of weight initialization technique used :

Glorot Normal :

Weights are being selected from a normal distribution with mean as 0 and standard deviation as above figure.

Glorot Uniform :

He Normal :

He Uniform :

Lecun Normal :

Weights are being selected from a normal distribution with mean as 0 and standard deviation as sqrt(1/inputs)

Lecun Uniform :

Weights are being selected from a uniform distribution with limits [-limit, limit]

Wij ~ Uniform Distribution [-limit, limit], where limit = sqrt(3 / fan_in)

Xavier/Glorot Initialization works well with hyperbolic Tan (tanh), Logistic(sigmoid)
He Initialization works well with Rectified Linear activation unit(ReLU) and its Variants.
LeCun Initialization works well with Scaled Exponential Linear Unit(SELU), hyperbolic Tan (tanh)

Hope you learned something new today, Happy Learning!