Computational Graphs & Autodiff â€” 15 Interview Questions

DAGs of ops, forward and backward passes, symbolic vs automatic differentiation, and how PyTorch/TensorFlow-style engines differ at a high level.

Colored left borders per card; green / amber / red difficulty chips.

DAG Autodiff Forward / back Frameworks

1 What is a computational graph?Easy

Answer: A directed acyclic graph (DAG) representing a function: nodes are variables or operations, edges show data flow. Used to evaluate the function and (with autodiff) derivatives systematically.

2 Nodes vs edgesâ€”typical assignment.Easy

Answer: Nodes: tensors after an op, or the op itself (depends on framework representation). Edges: which outputs feed which inputs. The graph encodes dependencies for topological order.

3 What is automatic differentiation?Easy

Answer: Computes exact derivatives (up to floating point) by applying chain rule along the graphâ€”not numerical finite differences, not full symbolic algebra on the whole expression tree by hand.

4 Forward-mode autodiffâ€”when useful?Medium

Answer: Pushes directional derivatives forward; costs scale with number of inputs. Useful when few inputs and many outputs (rare for standard NN training vs reverse mode).

5 Reverse-mode autodiffâ€”why dominant in ML?Medium

Answer: One scalar loss, millions of parametersâ€”reverse mode gets the full gradient in O(graph size) time, same order as one forward pass (roughly). This is backpropagation.

6 Why must the graph be acyclic?Medium

Answer: For standard autodiff you need a clear topological order. RNNs unroll in time creating a DAG over steps; true cycles need special handling (implicit differentiation / BPTT structure).

7 Eager execution vs define-then-run.Medium

Answer: Eager: build graph as Python runs (PyTorch default). Static: trace or compile a full graph first (older TF graphs, torch.compile, XLA)â€”enables fusion and deployment optimizations.

8 Leaf vs non-leaf tensors (PyTorch mental model).Medium

Answer: Leaves are parameters or inputs you optimize; intermediates are non-leaf. .grad fills on leaves by default; retain_graph keeps graph for multiple backward calls.

9 What does detach() do?Medium

Answer: Breaks the graph from that tensor onwardâ€”no gradient flows through. Used to freeze parts of the model or treat values as constants.

10 stop_gradient in TensorFlowâ€”same idea?Easy

Answer: Yesâ€”block gradients through that path; common in GANs, reinforcement learning tricks, or fixed targets.

11 Custom autograd Functionâ€”what must you implement?Hard

Answer: Forward computes outputs; backward receives gradient w.r.t. outputs and returns gradients w.r.t. each differentiable inputâ€”must be mathematically consistent with the forward op.

12 Higher-order derivativesâ€”does the graph recurse?Hard

Answer: Frameworks can build a graph over gradient computations (create_graph=True in PyTorch) for Hessian-vector products; memory and cost grow quickly.

13 Why are in-place ops dangerous with autograd?Medium

Answer: They can overwrite values still needed for backward. Frameworks error or warn when versions mismatchâ€”prefer out-of-place when tensors require grad.

14 Autodiff vs symbolic differentiation vs numeric finite differences.Medium

Answer: Symbolic: algebra rules, expression swell. Finite diff: cheap to code, inaccurate and slow for high-D. Autodiff: exact, efficient for ML-scale graphs.

15 Inference graph vs training graph.Easy

Answer: Inference drops backward nodes and anything only needed for gradientsâ€”smaller, faster. Export formats (ONNX, TorchScript) target forward-only execution.

Draw a tiny graph (mul, add) and label âˆ‚L/âˆ‚x on paperâ€”classic interview sanity check.

Quick review checklist

DAG; autodiff; forward vs reverse mode and why reverse wins for scalar loss.
Eager vs static; detach / stop_gradient; in-place hazards.
Custom backward; inference-only graphs.

Previous: Backpropagation Next: Network design

Related Neural Networks Links

Computational Graphs & Autodiff â€” 15 Interview Questions

Quick review checklist