Megatron-LM 3 posts
| Megaton-LM Training Large Models Practical Guide | 2 - Model Construct |
15 minute read
A practical guide to constructing and modifying GPT-style models in Megatron-LM: code organization, the Spec-based layer system, parameter flow, and how to switch between local and Transformer Engi...
Megatron-LM Practical Guide
| Megaton-LM Training Large Models Practical Guide | 1 - Data Preprocess |
16 minute read
A practical overview of Megatron-LM data preprocessing: supported text formats, the two-step preprocessing pipeline, and how IndexedDataset/GPTDataset/BlendedDataset indexing works, with engineerin...
Megatron-LM Practical Guide
Paper Interpretation 4 posts
Paper Summary for Recursive Looped Transformers: Latent Reasoning
19 minute read
A paper-reading note on latent reasoning in Looped / Recursive Transformers: scaling test-time compute via recurrent depth, recursive latent thoughts, and large-scale looped language models.
Recursive Transformers Paper Interpretation
Paper Summary for Recursive Looped Transformers: Parameter Efficiency
25 minute read
Exploring how loops and recursion can improve parameter utilization efficiency in LLMs. A comprehensive summary of recursive mechanisms in Transformer architectures.
Recursive Transformers Paper Interpretation
A One-Stop Guide to Scaling Laws in LLM Quantization
27 minute read
A comprehensive overview of Quantization Scaling Laws. Dive deep into 5 papers to understand how performance loss from quantization varies with model parameters and token count.
Quantization Paper Interpretation
5,000 words Analysis of FP4 Quantization for Training Large Language Models
29 minute read
Detailed Paper Interpretation of ‘Optimizing Large Language Model Training Using FP4 Quantization’. This post walks you through the motivation, key insights, and design rationale behind our work.
Quantization Paper Interpretation
Practical Guide 3 posts
| Megaton-LM Training Large Models Practical Guide | 2 - Model Construct |
15 minute read
A practical guide to constructing and modifying GPT-style models in Megatron-LM: code organization, the Spec-based layer system, parameter flow, and how to switch between local and Transformer Engi...
Megatron-LM Practical Guide
| Megaton-LM Training Large Models Practical Guide | 1 - Data Preprocess |
16 minute read
A practical overview of Megatron-LM data preprocessing: supported text formats, the two-step preprocessing pipeline, and how IndexedDataset/GPTDataset/BlendedDataset indexing works, with engineerin...
Megatron-LM Practical Guide
Quantization 2 posts
A One-Stop Guide to Scaling Laws in LLM Quantization
27 minute read
A comprehensive overview of Quantization Scaling Laws. Dive deep into 5 papers to understand how performance loss from quantization varies with model parameters and token count.
Quantization Paper Interpretation
5,000 words Analysis of FP4 Quantization for Training Large Language Models
29 minute read
Detailed Paper Interpretation of ‘Optimizing Large Language Model Training Using FP4 Quantization’. This post walks you through the motivation, key insights, and design rationale behind our work.
Quantization Paper Interpretation
Recursive Transformers 2 posts
Paper Summary for Recursive Looped Transformers: Latent Reasoning
19 minute read
A paper-reading note on latent reasoning in Looped / Recursive Transformers: scaling test-time compute via recurrent depth, recursive latent thoughts, and large-scale looped language models.
Recursive Transformers Paper Interpretation
Paper Summary for Recursive Looped Transformers: Parameter Efficiency
25 minute read
Exploring how loops and recursion can improve parameter utilization efficiency in LLMs. A comprehensive summary of recursive mechanisms in Transformer architectures.
Recursive Transformers Paper Interpretation