Technology Hub
AI Hub
Deep Learning
Activation

Activation Functions MCQ Â· test your knowledge

From ReLU to Swish â€“ 15 questions covering nonâ€‘linearity, vanishing gradient, output ranges, and modern variants.

Easy: 5 Medium: 6 Hard: 4

ReLU

Sigmoid

Tanh

Softmax

Your activation score

0/15

0 Correct0 Incorrect

Activation functions: the heart of neural networks

Activation functions introduce nonâ€‘linearity, allowing neural networks to approximate complex functions. This MCQ test covers classical and modern activation functions, their derivatives, output ranges, and practical considerations like vanishing gradient and dying ReLU.

Why nonâ€‘linearity?

Without nonâ€‘linear activation, stacked linear layers collapse into a single linear transformation, destroying the depth's representational power. Activations enable learning of complex, hierarchical features.

Activation glossary â€“ key concepts

ReLU (Rectified Linear Unit)

f(x)=max(0,x) â€“ most popular hidden layer activation. Computationally efficient, sparse, but can cause "dying ReLU".

Sigmoid

Ïƒ(x)=1/(1+e^-x) â€“ outputs between 0 and 1. Used in binary classification output, but saturates and kills gradients.

Tanh

tanh(x) â€“ zeroâ€‘centered, range (-1,1). Often preferred over sigmoid in hidden layers, but still saturates.

Softmax

Multiâ€‘class output activation; converts logits to probabilities summing to 1.

Leaky ReLU / PReLU

Allow small negative slope (e.g., 0.01) to avoid dying ReLU. Parametric ReLU learns the slope.

ELU / SELU

Exponential Linear Unit â€“ smooth negative part, can improve learning and normalise activations (SELU).

Swish / SiLU

swish(x)=xÂ·Ïƒ(x) â€“ discovered via search, often outperforms ReLU in deeper models.

# Common activation implementations (NumPy style)
def relu(x): return np.maximum(0, x)
def sigmoid(x): return 1/(1+np.exp(-x))
def tanh(x): return np.tanh(x)
def softmax(x): e = np.exp(x - x.max()); return e/e.sum()

Interview tip: Be ready to compare activation functions: ReLU vs Leaky ReLU, why sigmoid causes vanishing gradient, and when to use softmax. This MCQ covers these distinctions.

Common activation interview questions

Why is ReLU nonâ€‘linear if it looks like two linear pieces?
What is the "dying ReLU" problem and how can you fix it?
Why does sigmoid saturate and kill gradients?
Explain the output range of tanh and why zeroâ€‘centered activations help.
When would you use softmax versus sigmoid?
What are the advantages of Swish over ReLU?

AI Hub Next: Loss Functions