AI research atlas / v2

Learn AI papers in the right order.

Start with landmark ideas, move through foundations, then branch into LLMs, GenAI, agents, systems, and safety with a reading path that keeps the field from feeling random.

Start roadmap My reading

10 learning tracksFull-paper readerChatGPT handoff

Recommended firstLandmark papers

Build the mental timeline before going deep.

Then specializeLLMs, GenAI, safety

Move from foundations to modern systems.

Read modePDF + resources

Path-firstNo more random paper hopping

Research-nativearXiv links, PDFs, resources

Study loopTrack reading and discuss in ChatGPT

Learning path

Where to start, and what to read next

Start with landmarks

Orientation / 1-2 weeks

Start Here

Read the papers everyone keeps referencing so the rest of the map has anchors.

Know the landmark namesBuild historical contextPick a direction

Open papers

Foundations / 2-4 weeks

Classical ML

Learn the statistical and probabilistic ideas that still sit under modern models.

Bayesian thinkingModel evaluationUncertainty

Open papers

Foundations / 1-2 weeks

Optimization

Understand the training mechanics behind gradient-based learning.

Gradient descentGeneralizationTraining stability

Open papers

Builder / 3-5 weeks

Deep Learning Core

Move through representation learning, CNNs, residual networks, and scaling patterns.

CNN intuitionRepresentation learningBenchmark culture

Open papers

Builder / 3-6 weeks

Sequence Models and LLMs

Study attention, transformers, language modeling, instruction tuning, and evaluation.

AttentionPretrainingInstruction following

Open papers

Specialist / 3-6 weeks

Generative AI

Compare GANs, diffusion, autoregressive generation, and modern GenAI workflows.

DiffusionGANsGeneration tradeoffs

Open papers

Specialist / 2-4 weeks

Multimodal and Retrieval

Connect language with images, retrieval, embeddings, and real-world knowledge access.

Vision-languageEmbeddingsRetrieval

Open papers

Specialist / 3-5 weeks

RL and Agents

Learn decision making, feedback, policy learning, and agent-style systems.

PoliciesRewardsExploration

Open papers

Practitioner / 2-4 weeks

Systems and Scaling

Understand the infrastructure and engineering papers behind large-scale training.

Distributed trainingServingEfficiency

Open papers

Practitioner / 2-4 weeks

Safety and Interpretability

Study robustness, alignment, transparency, and how to reason about model behavior.

AlignmentRobustnessInterpretability

Open papers

Research library

ML Systems

Showing papers for this learning path. Open any paper card to read the full paper and related resources.

40 papers shown

unread2015

Distilling the Knowledge in a Neural Network

A very simple way to improve the performance of almost any machine learning algorithm is to train many different models on the same data and then to average their predictions. Unfortunately, making predictions using a whole ensemble of models is cumbersome and may be too computationally expensive to allow deployment to a large number of users, especially if the individual models are large neural nets. Caruana and his collaborators have shown that it is possible to compress the knowledge in an ensemble into a single model which is much easier to deploy and we develop this approach further using a different compression technique. We achieve some surprising results on MNIST and we show that we can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model. We also introduce a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse. Unlike a mixture of experts, these specialist models can be trained rapidly and in parallel.

Learn AI papers in the right order.

Where to start, and what to read next

Start Here

Classical ML

Optimization

Deep Learning Core

Sequence Models and LLMs

Generative AI

Multimodal and Retrieval

RL and Agents

Systems and Scaling

Safety and Interpretability

Architecture

Learning Paradigms

Applications

Trust and Deployment

ML Systems

Distilling the Knowledge in a Neural Network

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

LoRA: Low-Rank Adaptation of Large Language Models

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

A Survey on Distributed Machine Learning

Scaling Distributed Machine Learning with In-Network Aggregation

The applications of machine learning techniques in medical data processing based on distributed computing and the Internet of Things

Towards Demystifying Serverless Machine Learning Training

An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems

Breaking the computation and communication abstraction barrier in distributed machine learning workloads

KungFu: Making Training in Distributed Machine Learning Adaptive

[CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster

GRACE: A Compressed Communication Framework for Distributed Machine Learning

Privacy-Preserving in Blockchain-based Federated Learning Systems

Clairvoyant Prefetching for Distributed Machine Learning I/O

Janus: A Unified Distributed Training Framework for Sparse Mixture-of-Experts Models

Dynamic Stale Synchronous Parallel Distributed Training for Deep Learning

Deep Learning or Classical Machine Learning? An Empirical Study on Log-Based Anomaly Detection

Towards Scalable Distributed Training of Deep Learning on Public Cloud Clusters

A Distributed Intrusion Detection System using Machine Learning for IoT based on ToN-IoT Dataset

On the Utility of Gradient Compression in Distributed Training Systems

Dynamic GPU Energy Optimization for Machine Learning Training Workloads

Distributed Machine Learning through Heterogeneous Edge Systems

DC2: Delay-aware Compression Control for Distributed Machine Learning

Attention Is All You Need for Chinese Word Segmentation

Model poisoning attacks against distributed machine learning systems

Performance Analysis and Comparison of Distributed Machine Learning Systems

Rule-Based With Machine Learning IDS for DDoS Attack Detection in Cyber-Physical Production Systems (CPPS)

AMPeD: An Analytical Model for Performance in Distributed Training of Transformers

Apache Mahout: Machine Learning on Distributed Dataflow Systems

Mixing Activations and Labels in Distributed Training for Split Learning

Objective metrics for ethical AI: a systematic literature review

DC-SHAP Method for Consistent Explainability in Privacy-Preserving Distributed Machine Learning

MAD-Max Beyond Single-Node: Enabling Large Machine Learning Model Acceleration on Distributed Systems

Preemptive Scheduling for Distributed Machine Learning Jobs in Edge-Cloud Networks

Communication Scheduling as a First-Class Citizen in Distributed Machine Learning Systems

Assemblage: Automatic Binary Dataset Construction for Machine Learning

Improving Discharge Predictions in Ungauged Basins: Harnessing the Power of Disaggregated Data Modeling and Machine Learning

X-NEST: A Scalable, Flexible, and High-Performance Network Architecture for Distributed Machine Learning

When In-Network Computing Meets Distributed Machine Learning