Activation Functions MCQ · test your knowledge
From ReLU to Swish – 15 questions covering non‑linearity, vanishing gradient, output ranges, and modern variants.
Activation functions: the heart of neural networks
Activation functions introduce non‑linearity, allowing neural networks to approximate complex functions. This MCQ test covers classical and modern activation functions, their derivatives, output ranges, and practical considerations like vanishing gradient and dying ReLU.
Why non‑linearity?
Without non‑linear activation, stacked linear layers collapse into a single linear transformation, destroying the depth's representational power. Activations enable learning of complex, hierarchical features.
Activation glossary – key concepts
ReLU (Rectified Linear Unit)
f(x)=max(0,x) – most popular hidden layer activation. Computationally efficient, sparse, but can cause "dying ReLU".
Sigmoid
σ(x)=1/(1+e-x) – outputs between 0 and 1. Used in binary classification output, but saturates and kills gradients.
Tanh
tanh(x) – zero‑centered, range (-1,1). Often preferred over sigmoid in hidden layers, but still saturates.
Softmax
Multi‑class output activation; converts logits to probabilities summing to 1.
Leaky ReLU / PReLU
Allow small negative slope (e.g., 0.01) to avoid dying ReLU. Parametric ReLU learns the slope.
ELU / SELU
Exponential Linear Unit – smooth negative part, can improve learning and normalise activations (SELU).
Swish / SiLU
swish(x)=x·σ(x) – discovered via search, often outperforms ReLU in deeper models.
# Common activation implementations (NumPy style) def relu(x): return np.maximum(0, x) def sigmoid(x): return 1/(1+np.exp(-x)) def tanh(x): return np.tanh(x) def softmax(x): e = np.exp(x - x.max()); return e/e.sum()
Common activation interview questions
- Why is ReLU non‑linear if it looks like two linear pieces?
- What is the "dying ReLU" problem and how can you fix it?
- Why does sigmoid saturate and kill gradients?
- Explain the output range of tanh and why zero‑centered activations help.
- When would you use softmax versus sigmoid?
- What are the advantages of Swish over ReLU?