Related Neural Networks Links
Learn Rnn Neural Networks Tutorial, validate concepts with Rnn Neural Networks MCQ Questions, and prepare interviews through Rnn Neural Networks Interview Questions and Answers.
Neural Networks
15 Essential Q&A
Interview Prep
Recurrent Neural Networks — 15 Interview Questions
Hidden state over time, many-to-one vs seq2seq, BPTT truncation, and why LSTM/GRU replaced vanilla RNNs for long dependencies.
Colored left borders per card; green / amber / red difficulty chips.
Time
Hidden
LSTM
Seq2seq
1 What is a recurrent neural network?Easy
Answer: A model with a hidden state h_t updated each step: h_t = f(h_{t−1}, x_t)—same weights applied across time, suited to sequences.
2 Vanilla RNN update (simple form).Easy
Answer: Often h_t = tanh(W_h h_{t−1} + W_x x_t + b)—nonlinearity and affine combine previous state with current input.
h_t = tanh(W_h h_{t−1} + W_x x_t + b)
3 What is backpropagation through time (BPTT)?Medium
Answer: Unroll the network over T steps into a DAG, run backprop—gradient flows through every time link. Memory and compute grow with T.
4 Truncated BPTT—why?Medium
Answer: Limit backprop depth in time to a window—cheaper and stabilizes training; trades off long-range credit assignment.
5 Why do vanilla RNNs struggle with long sequences?Medium
Answer: Repeated Jacobian products over steps cause vanishing or exploding gradients—hard to learn long-range dependencies.
6 LSTM gates—names and roles.Medium
Answer: Forget (what to erase from cell), input (what to write), output (what to expose from cell). Cell state carries information additively—better gradient paths.
7 GRU vs LSTM—interview contrast.Easy
Answer: GRU merges forget+input into update gate, fewer parameters—often similar quality with less compute; LSTM still common historically.
8 Bidirectional RNN.Easy
Answer: Two RNNs: one forward, one backward; concatenate hidden states—uses future context; good for tagging/NLP, not for causal online prediction.
9 Encoder–decoder (seq2seq) idea.Medium
Answer: Encoder RNN compresses input sequence to context vector; decoder RNN generates output sequence—basis of early NMT before attention dominated.
10 Teacher forcing.Medium
Answer: During training, decoder gets ground-truth previous token as input instead of its own prediction—speeds convergence; exposure bias handled with scheduled sampling etc.
11 Padding and pack_padded_sequence—why?Hard
Answer: Batched variable-length sequences are padded; pack avoids wasted compute on pad tokens and keeps hidden state meaningful in frameworks like PyTorch.
12 Many-to-one vs many-to-many examples.Easy
Answer: Many-to-one: sentiment from a sentence. Many-to-many: POS tagging per token; seq2seq: translation.
13 When do Transformers replace RNNs?Medium
Answer: When you have data/compute for self-attention—parallel over length, long-range in O(1) layers per hop; RNNs sequential and slower on GPU for long sequences.
14 1D CNN for sequences vs RNN.Medium
Answer: 1D conv stacks local n-grams with depth for context—fast and parallel; RNN/attention better for very long flexible dependencies depending on design.
15 State one advantage of RNN family today.Easy
Answer: Small memory per step for streaming or tiny devices; some tasks still use LSTM baselines—though LLMs are Transformer-first.
Draw unrolled RNN for BPTT—classic whiteboard question.
Quick review checklist
- Recurrence; BPTT; truncate; vanishing in vanilla RNN.
- LSTM/GRU; bidirectional; encoder–decoder; teacher forcing.
- Packing padded batches; Transformer comparison.