Up to Date Technical Dive into State of AI

Detailed Summary of Lex Fridman Podcast: AI State-of-the-Art 2026 with Nathan Lambert and Sebastian RaschkaThis episode (YouTube: https://www.youtube.com/watch?v=EV7WhVT270Q, released around early 2026) features Lex Fridman interviewing Nathan Lambert (post-training lead at Allen Institute for AI, author of The RLHF Book) and Sebastian Raschka (author of Build a Large Language Model (From Scratch) and Build a Reasoning Model (From Scratch)). The ~4-hour discussion covers the current state of AI in 2026, focusing on LLMs, scaling laws, competition between models and nations, training phases, emerging research, and future predictions for AGI, robotics, and societal impacts. The tone is technical yet accessible, emphasizing hands-on building, empirical progress, and balanced optimism about AI’s potential and risks.

Raschka’s books teach building LLMs and reasoning models from scratch using PyTorch, emphasizing practical coding over theory. Lambert’s work at AI2 focuses on post-training (RLHF), and his book covers reinforcement learning from human feedback.

Nathan’s Book: https://rlhfbook.com
Sebastian’s X: https://x.com/rasbt
Sebastian’s Blog: https://magazine.sebastianraschka.com
Sebastian’s Website: https://sebastianraschka.com
Sebastian’s YouTube: / @sebastianraschka
Sebastian’s GitHub: https://github.com/rasbt
Sebastian’s Books:
Build a Large Language Model (From Scratch): https://www.manning.com/books/build-a-large-language-model-from-scratch.
Build a Reasoning Model (From Scratch): https://www.manning.com/books/build-a-reasoning-model-from-scratch

Hands-on learning is praised as the best way to understand AI internals. The conversation aims to be technical without alienating beginners.

Lex: “The best way to learn about something is to build it yourself from scratch.”
Raschka: “My books are for people who want to get their hands dirty with code.”

China vs US: Who Wins the AI Race? (1:57 – 10:38)

Triggered by DeepSeek R1 (January 2025, open-weight Chinese model) achieving state-of-the-art (SOTA) with lower compute/cost.

China excels in open-weight releases (DeepSeek V3, Z.AI/GLM, MiniMax/Kimi, Moonshot) for global influence, driven by government incentives and lack of software payment culture. US focuses on proprietary models but faces hype cycles (Claude Opus 4.5 strong in code).
Raschka argues budgets/hardware differentiate more than tech and ideas spread quickly. Lambert notes China’s secretive approaches (DeepSeek backed by hedge fund Highflyer) and consolidation trends in 2026 due to high costs. US has chaotic organizations but better outputs. China aims for Western mindshare via IPOs.

Models like Kimi build on DeepSeek architectures for leapfrogging. Most recent releases perform best. Chinese models face serving issues (fewer GPUs, different errors).
Prediction that there will be consolidation in China in 2026. More open builders. US incumbents win via brand, but Chinese gain traction if hosted in US (via OpenRouter).

DeepSeek kicked off a movement with strong frontier open-weight models.

ChatGPT vs Claude vs Gemini vs Grok: Who is Winning? (10:38 – 21:38)

2025 saw Gemini 3 hyped but Claude Opus 4.5 leading in code/philosophy. ChatGPT dominant via incumbency, GPT-5 as a router for efficiency. Personal usage: Stick to one until it breaks, then switch. multi-subscriptions for work/personal.

Lambert favors Gemini for speed/explanations, Claude for code, Grok for real-time/AI Twitter integration. Raschka prefers non-thinking modes for quick tasks (Bash scripts). Chinese models biased toward US-favoring outputs but slower.

Trade-offs in intelligence (thinking modes like o1) vs. speed. Long context in Gemini (needle-in-haystack tests). Extended thinking marginally smarter but costlier.

Gemini will erode ChatGPT in 2026 via Google’s scale/TPUs. Anthropic strong in enterprise. $2K subscriptions possible. Inference scaling unlocks new capabilities.

Lambert: “It’s hard to bet against ChatGPT despite the chaos.”
Raschka: “I go crazy with 30-min waits, but auto modes are good.”

Best AI for Coding (21:38 – 28:29)

Tools like Cursor (diff-based, micromanagement) vs. Claude Code (agentic, English-to-code). LLMs save days on tasks (e.g., web scraping).
Arguments: Lex notes macro-level thinking with agents; Raschka emphasizes building from scratch for verification (“code doesn’t lie”). Lambert highlights fun in using Claude for projects.
Technical Explanations: Cloud Code + Claude Opus 4.5 handles project-wide contexts better than plugins like Codex in VSCode.
Predictions: None explicit; implies agents will evolve for skill-building.
Notable Quotes: Lex: “Agents change the programming process at a macro level.” Raschka: “Building an LLM from scratch is fun because if the code works, it’s correct.”

Open Source vs Closed Source LLMs (28:29 – 40:08)
Explosion of open models in 2025 (10+ major from China/West vs. fewer in 2024).

Standouts: Chinese (DeepSeek R1/V3, Qwen big/high-perf); Western (Mistral Large 3, NVIDIA Nemotron 400B, OLMo from AI2, GPT-OSS for tool-use).

Lambert stresses open for trust/transparency and GPU offload. Raschka notes unrestricted licenses enable customization (law/medical domains). Chinese focus on global distro; Western on data/code quality.

MoEs (Mixture of Experts) peak in China for efficiency. GPT-OSS reduces hallucinations via tools (web search/Python interp).
US/Europe release big MoEs in 2026. shift to tool-use, more utility from open models.

Lambert: “Releasing open models is the #1 way to get people to use your AI.” Raschka: “GPT-OSS with tool use is a huge unlock for hallucinations.”

Transformers: Evolution of LLMs since 2019 (40:08 – 48:05)

Core architecture stable (decoder-only from GPT-2: embeddings → transformer blocks with attention/FFN/norm).

Raschka explains tweaks like MoE (sparse experts for efficiency), group query attention (cheaper inference). Lambert notes stability in autoregressive core but turbulence in pre/post-training.

MoE: Router selects experts per token. KV cache optimizations for long context; alternatives like linear scaling attention (Qwen3-NeXT, state-space inspired).
Transformers dominate SOTA but diffusion/Mamba for cheap ends.
Raschka: “MoE packs knowledge without activating all experts every time.”

AI Scaling Laws: Are They Dead or Still Holding? (48:05 – 1:04:12)

Scaling laws (compute/data → accuracy) hold but shift to inference (o1) and RL (linear gains post-training).

Lambert bullish on low-hanging fruit like RLVR/inference tokens.

Screenshot

Arxiv – RL in the Wild: Characterizing RLVR Training in LLM Deployment

How RLVR Works: Step-by-Step Process

1. RLVR integrates reinforcement learning principles into the post-training phase of LLMs, focusing on tasks with clear ground-truth verification.

Here’s a detailed breakdown:
Problem Formulation as RL:State: The current sequence of tokens generated by the model (e.g., the prompt plus any intermediate reasoning steps in a CoT).
Action: The next token or set of tokens the model can generate.
Reward: A binary or scalar signal from a verifier, determined by whether the final output achieves the desired outcome (e.g., correct math result, passing code tests)

2. Generation and Sampling:The model generates multiple candidate responses, often using techniques like CoT to produce intermediate reasoning steps. For example, in a math problem, the model might output: “To solve 2x + 3 = 7, subtract 3: 2x = 4, divide by 2: x = 2.”
Sampling is iterative: The model explores various paths (e.g., via beam search or Monte Carlo sampling), allowing it to “try” different reasoning trajectories.

3. Verification with Programmatic Rewards
A “verifier” (e.g., a calculator for math, compiler for code, or rule-based checker for SQL) evaluates the final answer.
Reward is assigned: Positive (1) if correct, zero or negative if incorrect. This is deterministic and external, not learned from data like in RLHF.
Example: For SQL reasoning, Databricks used RLVR where the verifier checks if the generated query produces the expected database output

4. Optimization and Policy Update
Using algorithms like Proximal Policy Optimization (PPO) or Direct Preference Optimization (DPO), the model’s policy is updated to maximize expected rewards.
The model learns to concentrate probability mass on high-reward paths, effectively compressing multi-try search into single-pass inference (e.g., if a base model solves a problem in 8 attempts, RLVR trains it to succeed in 1).

Reinforcement Learning with Verifiable Rewards- Advantages and Limitations

Screenshot

Raschka sees right ratios for bang-for-buck. Pre-training saturates financially.
Power laws extend to RL. Infinite compute trades pre-fixed for per-query inference.
Predicts Bigger models with $2K subs and next frontiers unknown (continual learning?).
Notable Quotes: Lambert: “Scaling continues with RLVR and inference as low-hanging fruit.”

How AI is Trained: Pre-training, Mid-training, and Post-training (1:04:12 – 1:37:18)

Pre-training on vast corpora (trillions tokens, quality > quantity via rephrasing/OCR); mid-training for specialization (long context). Post-training (SFT, DPO, RLHF).
Arguments: Raschka stresses quality data (OLMo-3 better with less); Lambert warns of “voice” loss in RLHF (averages preferences, limits edge).
Technical Explanations: RLHF compresses prefs but removes nuances; LLMs as intermediaries needing verification.
Predictions: Generic models face legal issues; over-reliance leads to burnout.
Notable Quotes: Lex: “RLHF averages the human condition.” Raschka: “Struggle develops expertise.”

Post-training Explained: Exciting New Research Directions in LLMs (1:37:18 – 1:58:11)

RLVR (iterative generate-grade) for tools/science; links to inference scaling.
Lambert excited about linear performance from logarithmic compute. Raschka questions “aha” moments (pre-train contamination?).
RLVR: Self-correct via step-by-step. Process rewards score explanations. Value functions for token scoring.
Branches to science/software; value/process unproven but promising.
Lambert: “RLVR defines behaviors for tools and software.”

Advice for Beginners on How to Get into AI Development & Research (1:58:11 – 2:21:03)

Start with simple models on 1 GPU. Reverse-engineer outputs. Narrow focus after fundamentals.
Raschka says to Avoid getting overwhelmed/burnout.

Lambert suggest Individual contributions via LoRA fine-tuning.
LoRA for efficient personalization and evaluate failures for impact.
Education shifts to partial LLMs and startups high-risk/reward.
Raschka: “See the LLM innards by building.”
Lambert: “Narrow after fundamentals in an info flood.”

Work Culture in AI (72+ Hour Weeks) (2:21:03 – 2:24:49)

9/9/6 culture leads to burnout. Passion drives but human costs high (Apple suicides).
Lambert says Academia has less pressure. Raschka says Balance is needed for sustainability.
Culture evolves as field matures.

Silicon Valley Bubble (2:24:49 – 2:28:46)
Hype vs. reality. Overvaluation risks burst.
Guests caution against echo chambers and to focus on real utility.
Potential correction in 2026 if progress slows.

Text Diffusion Models and Other New Research Directions (2:28:46 – 2:34:28)
Diffusion for text generation. emerging alternatives to transformers.
Diffusion handles sequences differently, potentially more efficient.
Niche adoption for specific tasks.

Tool Use (2:34:28 – 2:38:44)

Integrations reduce hallucinations. GPT-OSS leads.
Lambert: Tools > memorization.
Raschka: Tolls are essential for reliability.
Standard in all models by late 2026.

Continual Learning (2:38:44 – 2:44:06)

Avoiding catastrophic forgetting in updates.
Technical Explanations: Selective data in mid-training.
Predictions is that Continual Learning is Key for long-term model evolution.

Long Context (2:44:06 – 2:50:21)
KV cache optimizations and needle-in-haystack improvements.
Arguments: Raschka: Economy crucial; Lambert: Enables complex tasks.
Predictions: Million-token contexts routine.

Robotics (2:50:21 – 2:59:31)

AI in embodied agents and data challenges.
Guests see slow progress due to hardware and LLMs accelerate simulation.
Predict Breakthroughs in 2027-2028.

Timeline to AGI (2:59:31 – 3:06:47)

“Powerful AI” by 2028-2030. It will not be instant singularity.
Lambert says Gradual via scaling.
Raschka sees blockers like data persist.
They predict a moderately fast takeoff.

Will AI Replace Programmers? (3:06:47 – 3:25:18)

90% automation. humans on high-level design.
Raschka: Joy in struggle remains
Lambert: Shifts to verification.

Programmers evolve but will not be obsolete.

Is the Dream of AGI Dying? (3:25:18 – 3:32:07)

Hype fatigue but progress real.
Guests optimistic. focus on utility over hype.
AGI Dream persists but will be redefined (moving goalposts).

How AI Will Make Money? (3:32:07 – 3:36:29)
Subscriptions, enterprise tools, ads.
Predictions there will be diversification beyond APIs.

Big Acquisitions in 2026 (3:36:29 – 3:41:01)

Consolidation. Big tech buying startups.
More mergers for compute/talent.

Future of OpenAI, Anthropic, Google DeepMind, xAI, Meta (3:41:01 – 3:53:35)
OpenAI innovative but chaotic.
Anthropic safety-focused.
Google scales.
xAI agile.
Meta open. (aka Meta just a big leading user for ads and social media)
Leaders consolidate power.

Manhattan Project for AI (3:53:35 – 4:00:10)
National efforts. Risks of centralization.
Balance collaboration with competition.

Future of NVIDIA, GPUs, and AI Compute Clusters (4:00:10 – 4:08:15)
Blackwell GPUs in Q1 2026; clusters grow to billions.
Technical Explanations: FP8/FP4 for throughput.
Predictions: NVIDIA dominates; alternatives emerge.

Future of Human Civilization (4:08:15 – End)
AI transforms biology/programming but risks like power concentration.
Optimism for positives and caution on ethics.
AI accelerates progress, humans adapt.