1 Analyze — Research & experimentation
Problem framing
Business objective
Success metrics
Baseline definition
Feasibility analysis
Data availability
Data exploration
EDA notebooks
Distribution analysis
Feature correlation
Missing data audit
Bias detection
Model selection
Foundation model eval
Architecture search
Literature review
Benchmark comparison
Cost/perf tradeoff
Prototyping
Jupyter experiments
Quick eval harness
Prompt engineering
Few-shot testing
POC demo
Approval gate
Research review
Cost estimate
Timeline proposal
Risk assessment
Go / No-go decision
2 Data preparation & feature engineering
Data sourcing
Transaction history
Credit bureau data
User interactions
Market data feeds
Synthetic generation
Cleaning & validation
Schema validation
Outlier handling
Deduplication
Null imputation
Consistency checks
Feature engineering
Time-series features
Aggregation windows
Embedding generation
Cross features
Feature store (Feast)
LLM data prep
Instruction tuning pairs
RLHF preference data
Document chunking
Quality filtering
Decontamination
Data governance
PII removal
Consent verification
Bias audit
Lineage tracking
Version control (DVC)
3 Train — Model training & fine-tuning
Foundation fine-tuning
Claude (Anthropic)
GPT-4 (OpenAI)
Gemini (Vertex)
LoRA / QLoRA
Full fine-tune (rare)
Custom models
Credit risk model
Fraud detection
FX prediction
Churn prediction
NER (PII detection)
Training infra
SageMaker (AWS)
GPU: A100 / H100
Distributed training
Spot instances
Checkpoint saving
Experiment tracking
MLflow experiments
Weights & Biases
Hyperparameter logs
Loss curves
Resource utilization
RLHF pipeline
Reward model train
PPO optimization
DPO (direct pref.)
KL divergence guard
Safety alignment
4 Evaluate — Benchmarking & validation
LLM evaluation
Accuracy (held-out)
Hallucination rate
Instruction following
Domain knowledge
Human eval (Elo)
ML model eval
AUC-ROC
Precision / Recall
F1 score
Calibration
Feature importance
Safety testing
Red-team attacks
Bias benchmarks
Toxicity eval
Prompt injection test
Edge case coverage
Business validation
A/B test design
Offline replay
Cost projection
Latency benchmark
Stakeholder review
Model card
Capabilities doc
Limitations noted
Intended use cases
Ethical considerations
Sign-off for deploy
5 Deploy — Production serving & scaling
Deployment strategy
Shadow mode (silent)
Canary (5% traffic)
Blue-green switch
Progressive rollout
Instant rollback
Serving infra
vLLM (LLM serving)
TorchServe (ML)
SageMaker endpoints
Auto-scaling
GPU optimization
Model registry
Version control
Stage promotion
Artifact storage
Lineage tracking
Deprecation policy
Optimization
Quantization (INT8)
Distillation
KV cache tuning
Batch inference
Speculative decoding
Cost management
Spot/reserved mix
Right-sizing
Request routing
Cache layers
Budget alerts
6 Monitor — Drift detection & continuous improvement
Performance monitoring
Accuracy tracking
Latency p50/p95/p99
Throughput (RPS)
Error rate
GPU utilization
Drift detection
Data drift (KL div)
Concept drift
Prediction drift
Feature drift
Auto-retrain trigger
Alerting
Accuracy < threshold
Latency spike
Cost overrun
Safety violation
PagerDuty integration
Feedback loop
User signals → retrain
A/B results → promote
Drift → auto-fix
Error → investigate
Quarterly review
Active models
Retrain cycle: 2-4 weeks avg