LlamaCast
Shahriar Shariati
Categorias: Tecnología
Escuchar el último episodio:
⚖️ Scaling Laws for Precision
This research paper investigates the impact of precision in training and inference on the performance of large language models. The authors explore how precision affects the effective parameter count and propose scaling laws that predict performance degradation due to low-precision training and post-training quantization. They find that overtrained models are more sensitive to post-training quantization, and that training larger models in lower precision might be computationally optimal. Their unified scaling law accounts for both training and post-training effects and predicts loss in varied precision settings, ultimately suggesting that the standard practice of training models in 16-bit might be suboptimal.
📎 Link to paper
🌐 Read their Tweet
Episodios anteriores
-
48 - Scaling Laws for Precision Mon, 18 Nov 2024
-
47 - Test-Time Training Thu, 14 Nov 2024
-
46 - Qwen2.5-Coder Tue, 12 Nov 2024
-
45 - Attacking Vision-Language Computer Agents via Pop-ups Sat, 09 Nov 2024
-
44 - Number Cookbook Fri, 08 Nov 2024
-
43 - Jigsaw Puzzles Thu, 07 Nov 2024
-
42 - Multi-expert Prompting with LLMs Tue, 05 Nov 2024
-
41 - Investigating the Role of Prompting and External Tools in Hallucination Rates of LLMs Sun, 03 Nov 2024
-
40 - Mind Your Step (by Step) Sat, 02 Nov 2024
-
39 - SimpleQA Thu, 31 Oct 2024
-
38 - GPT-4o System Card Wed, 30 Oct 2024
-
37 - Mixture of Parrots Tue, 29 Oct 2024
-
36 - Improve Vision Language Model Chain-of-thought Reasoning Mon, 28 Oct 2024
-
35 - Breaking the Memory Barrier Sun, 27 Oct 2024
-
34 - LLMs Reflect the Ideology of their Creators Sat, 26 Oct 2024
-
33 - LongRAG Fri, 25 Oct 2024
-
32 - A Theoretical Understanding of Chain-of-Thought Thu, 24 Oct 2024
-
31 - A Survey on Data Synthesis and Augmentation for Large Language Models Wed, 23 Oct 2024
-
30 - Revealing the Barriers of Language Agents in Planning Tue, 22 Oct 2024
-
29 - Intelligence at the Edge of Chaos Mon, 21 Oct 2024
-
28 - Inference Scaling for Long-Context RAG Sun, 20 Oct 2024
-
27 - Model Swarms Sat, 19 Oct 2024
-
26 - Agent-as-a-Judge Fri, 18 Oct 2024
-
25 - First-Person Fairness in Chatbots Fri, 18 Oct 2024
-
24 - Thinking LLMs Fri, 18 Oct 2024
-
23 - Addition is All You Need Fri, 18 Oct 2024
-
22 - MLE-bench Fri, 18 Oct 2024
-
21 - Long-Context LLMs Meet RAG Fri, 18 Oct 2024
-
20 - GSM-Symbolic Fri, 18 Oct 2024
-
19 - Anti-Social LLM Fri, 18 Oct 2024
-
18 - Differential Transformer Fri, 18 Oct 2024
-
17 - ToolGen Fri, 18 Oct 2024
-
16 - LangGPT Fri, 18 Oct 2024
-
15 - Movie Gen Fri, 18 Oct 2024
-
14 - LLMs Know More Than They Show Fri, 18 Oct 2024
-
13 - Were RNNs All We Needed? Fri, 18 Oct 2024
-
12 - SLMs, A Survey Fri, 18 Oct 2024
-
11 - o1 in Medicine Fri, 18 Oct 2024
-
10 - RAG and Beyond Fri, 18 Oct 2024
-
9 - Molmo and PixMo Fri, 18 Oct 2024
-
8 - Self-Taught Evaluators Fri, 18 Oct 2024
-
7 - Larger LLMs Become Less Reliable Fri, 18 Oct 2024
-
6 - Logic-of-Thought Fri, 18 Oct 2024
-
5 - Moshi Fri, 18 Oct 2024
-
4 - Jailbreaking Large Language Models with Symbolic Mathematics Fri, 18 Oct 2024
-
3 - LLMs Still Can't Plan; Can LRMs? Fri, 18 Oct 2024
-
2 - A Comprehensive Evaluation of Quantized Instruction-Tuned LLMs Fri, 18 Oct 2024
-
1 - On the Diagram of Thought Thu, 17 Oct 2024