Generative AI: 20 Interview Questions

Master LLMs, GANs, VAEs, Diffusion Models, GPT, prompt engineering, RLHF, RAG, fine-tuning (LoRA), and evaluation metrics. Interview-ready answers with formulas.

LLM Diffusion GPT LoRA Prompt RAG

1 What is Generative AI? How different from discriminative models? âš¡ Easy

Answer: Generative AI models learn the joint probability distribution P(X, Y) or P(X) to generate new samples from the same distribution. Discriminative models learn decision boundary P(Y|X). Generative models create; discriminative models classify.

Generative: learn P(X). Discriminative: learn P(Y|X).

2 Major families of generative models? ðŸ“Š Medium

Answer: GANs (adversarial), VAEs (variational), Flow-based (invertible), Diffusion (denoising), Autoregressive (GPT), Energy-based, Hybrids. Each with different trade-offs in sample quality, likelihood, speed.

3 Explain GAN. Generator and discriminator loss? ðŸ”¥ Hard

Answer: GAN: Generator (G) creates fake samples; Discriminator (D) distinguishes real vs fake. Min-max game: D tries to maximize log(D(x)) + log(1-D(G(z))); G minimizes log(1-D(G(z))) (or -log(D(G(z))) for better gradients).

min_G max_D V(D,G) = E_x[log D(x)] + E_z[log(1 - D(G(z)))]

4 How does VAE work? ELBO explained. ðŸ”¥ Hard

Answer: VAE learns latent distribution p(z|x). Encoder outputs Î¼, Ïƒ. ELBO = reconstruction - KL(q(z|x)||p(z)). Maximizing ELBO â‰ˆ minimizing KL divergence + maximizing likelihood. Reparameterization trick enables backprop through sampling.

5 How do denoising diffusion models work? ðŸ”¥ Hard

Answer: Forward: gradually add Gaussian noise to data over T steps (q(x_t|x_{t-1})). Reverse: learn to denoise (p_Î¸(x_{t-1}|x_t)). Trained with variational bound; simplified loss: predict added noise Îµ. Stable Diffusion adds text conditioning via cross-attention.

6 What are autoregressive generative models? ðŸ“Š Medium

Answer: Generate sequence token by token, conditioning on previous tokens. P(x) = âˆ P(x_t | x_<t). Examples: GPT, PixelCNN, WaveNet. Parallelization limited (inference sequential), but stable training.

7 What is prompt engineering? Common techniques? ðŸ“Š Medium

Answer: Designing input prompts to steer LLM output. Techniques: zero-shot, few-shot, chain-of-thought (CoT), role prompting, system prompts, instruction tuning, delimiter usage. Critical for production LLM applications.

8 Full fine-tuning vs parameter-efficient methods (PEFT)? ðŸ”¥ Hard

Answer: Full fine-tuning updates all weights (costly, catastrophic forgetting). PEFT: adapters, prefix tuning, LoRA (low-rank matrices). LoRA: W' = W + BA, freeze W, train BA. Saves memory, multi-task serving.

9 Explain LoRA mathematically. Why no inference latency? ðŸ”¥ Hard

Answer: LoRA: Î”W = BA (BâˆˆR^{dÃ—r}, AâˆˆR^{rÃ—k}), r << d. Update: h = Wx + BAx. Merge after training: W_merged = W + BA. No extra compute at inference. Reduces trainable params by ~10,000x.

10 How does RLHF work? (ChatGPT) ðŸ”¥ Hard

Answer: 1) Supervised fine-tuning (SFT). 2) Train reward model from human preference comparisons. 3) Optimize policy with PPO using reward model. Aligns models with human preferences, reduces harm, improves helpfulness.

11 What is RAG? When to use? ðŸ“Š Medium

Answer: RAG = retrieve relevant documents from external corpus, add to context, then generate. Reduces hallucination, handles knowledge updates without retraining, improves factual accuracy. Used in chatbots, QA systems.

12 Why do LLMs hallucinate? How to reduce? ðŸ”¥ Hard

Answer: Causes: model trained to be helpful (may guess), lack of knowledge, memorized misconceptions, sampling randomness. Mitigation: RAG, prompt engineering (cite sources), temperature reduction, reinforcement learning from feedback, controlled decoding.

13 Temperature scaling vs top-k vs nucleus (top-p) sampling? ðŸ“Š Medium

Answer: Temperature T: softmax(logits/T). Tâ†’0 greedy, T=1 standard, T>1 more random. Top-k: sample from k highest probability tokens. Top-p (nucleus): sample from smallest set with cumulative probability â‰¥ p. Often combined.

14 Metrics for evaluating generative models? ðŸ”¥ Hard

Answer: Image: Inception Score (IS), FID (FrÃ©chet Inception Distance), Precision/Recall. Text: perplexity, BLEU, ROUGE, METEOR, BERTScore, human evaluation. LLM-as-a-judge (GPT-4). No single metric captures all.

15 What is mode collapse? Solutions? ðŸ”¥ Hard

Answer: Generator produces limited varieties (collapses to few modes). Solutions: WGAN (Wasserstein loss), gradient penalty, mini-batch discrimination, unrolled GANs, diverse batch sampling, regularization.

16 Architecture of Stable Diffusion? ðŸ”¥ Hard

Answer: 1) VAE encoder/decoder (compress to latent space). 2) U-Net with cross-attention (denoise). 3) Text encoder (CLIP/OpenCLIP). Diffusion in latent space (efficient). Conditioned on text embeddings. Classifier-free guidance.

17 What is classifier-free guidance (CFG)? ðŸ”¥ Hard

Answer: Diffusion model trained with and without condition (randomly drop). Sampling: Ïµ = Ïµ_cond + w(Ïµ_cond - Ïµ_uncond). w > 1 increases condition adherence (at cost of diversity). Common in DALLÂ·E 2, Stable Diffusion.

18 What is VQ-VAE? Why used? ðŸ”¥ Hard

Answer: VQ-VAE uses discrete latent codes (learned codebook). Encoder output mapped to nearest embedding, decoder reconstructs. Avoids posterior collapse, good for high-fidelity generation. Used in VQ-GAN, DALLÂ·E.

19 Chain-of-Thought prompting â€“ how does it help? ðŸ“Š Medium

Answer: CoT encourages step-by-step reasoning before final answer. Improves performance on arithmetic, commonsense, symbolic reasoning tasks. Zero-shot CoT: "Let's think step by step". Emergent ability of large models.

20 What is Constitutional AI (Anthropic)? ðŸ”¥ Hard

Answer: Train models to critique and revise own outputs based on constitution principles. RLHF from AI feedback (RLAIF). Reduces need for human labeling, improves harmlessness, scalable oversight.

Generative AI â€“ Interview Cheat Sheet

Model Families

GAN Adversarial, sharp images, mode collapse
Diffusion SOTA images, slow inference
VAE Stable, likelihood, blurry
Autoregressive GPT, sequential

LLM Techniques

Fine-tuning: Full vs LoRA/adapters
RLHF: Reward model + PPO
RAG: Retrieve + generate
Prompt: CoT, few-shot, system

Evaluation

Image FID, IS, Precision/Recall
Text Perplexity, BLEU, BERTScore
LLM GPT-4 as judge

Challenges

Hallucination (RAG, prompt)
Mode collapse (WGAN, diversity)
Posterior collapse (Î²-VAE, VQ)
Inference speed (distillation)

Verdict: "GANs and VAEs are foundations; Diffusion and LLMs dominate today. Know your fine-tuning and prompting."

FAANG

Related Deep Learning Links