Neural Networks 15 Essential Q&A
Interview Prep

Computational Graphs & Autodiff — 15 Interview Questions

DAGs of ops, forward and backward passes, symbolic vs automatic differentiation, and how PyTorch/TensorFlow-style engines differ at a high level.

Colored left borders per card; green / amber / red difficulty chips.

DAG Autodiff Forward / back Frameworks
1 What is a computational graph?Easy
Answer: A directed acyclic graph (DAG) representing a function: nodes are variables or operations, edges show data flow. Used to evaluate the function and (with autodiff) derivatives systematically.
2 Nodes vs edges—typical assignment.Easy
Answer: Nodes: tensors after an op, or the op itself (depends on framework representation). Edges: which outputs feed which inputs. The graph encodes dependencies for topological order.
3 What is automatic differentiation?Easy
Answer: Computes exact derivatives (up to floating point) by applying chain rule along the graph—not numerical finite differences, not full symbolic algebra on the whole expression tree by hand.
4 Forward-mode autodiff—when useful?Medium
Answer: Pushes directional derivatives forward; costs scale with number of inputs. Useful when few inputs and many outputs (rare for standard NN training vs reverse mode).
5 Reverse-mode autodiff—why dominant in ML?Medium
Answer: One scalar loss, millions of parameters—reverse mode gets the full gradient in O(graph size) time, same order as one forward pass (roughly). This is backpropagation.
6 Why must the graph be acyclic?Medium
Answer: For standard autodiff you need a clear topological order. RNNs unroll in time creating a DAG over steps; true cycles need special handling (implicit differentiation / BPTT structure).
7 Eager execution vs define-then-run.Medium
Answer: Eager: build graph as Python runs (PyTorch default). Static: trace or compile a full graph first (older TF graphs, torch.compile, XLA)—enables fusion and deployment optimizations.
8 Leaf vs non-leaf tensors (PyTorch mental model).Medium
Answer: Leaves are parameters or inputs you optimize; intermediates are non-leaf. .grad fills on leaves by default; retain_graph keeps graph for multiple backward calls.
9 What does detach() do?Medium
Answer: Breaks the graph from that tensor onward—no gradient flows through. Used to freeze parts of the model or treat values as constants.
10 stop_gradient in TensorFlow—same idea?Easy
Answer: Yes—block gradients through that path; common in GANs, reinforcement learning tricks, or fixed targets.
11 Custom autograd Function—what must you implement?Hard
Answer: Forward computes outputs; backward receives gradient w.r.t. outputs and returns gradients w.r.t. each differentiable input—must be mathematically consistent with the forward op.
12 Higher-order derivatives—does the graph recurse?Hard
Answer: Frameworks can build a graph over gradient computations (create_graph=True in PyTorch) for Hessian-vector products; memory and cost grow quickly.
13 Why are in-place ops dangerous with autograd?Medium
Answer: They can overwrite values still needed for backward. Frameworks error or warn when versions mismatch—prefer out-of-place when tensors require grad.
14 Autodiff vs symbolic differentiation vs numeric finite differences.Medium
Answer: Symbolic: algebra rules, expression swell. Finite diff: cheap to code, inaccurate and slow for high-D. Autodiff: exact, efficient for ML-scale graphs.
15 Inference graph vs training graph.Easy
Answer: Inference drops backward nodes and anything only needed for gradients—smaller, faster. Export formats (ONNX, TorchScript) target forward-only execution.
Draw a tiny graph (mul, add) and label ∂L/∂x on paper—classic interview sanity check.

Quick review checklist

  • DAG; autodiff; forward vs reverse mode and why reverse wins for scalar loss.
  • Eager vs static; detach / stop_gradient; in-place hazards.
  • Custom backward; inference-only graphs.