RLVR amplifies reasoning patterns that already exist. Qwen2.5-Math can uniquely do “code reasoning”-solving math by writing Python💻 (without execution). Code reasoning correlates with correctness (64% w/ vs 29% w/o). Spurious training amplifies code usage to 90%+.
Just having reasoning models do more work in general, makes them improve performance.
💡Our hypothesis: RLVR amplifies reasoning patterns that already exist
Qwen2.5-Math can uniquely do "code reasoning"-solving math by writing Python💻 (without execution)
Code reasoning correlates with correctness (64% w/ vs 29% w/o)
Spurious training amplifies code usage to 90%+ pic.twitter.com/8dV2TdlBqM— Stella Li (@StellaLisy) May 27, 2025

Brian Wang is a Futurist Thought Leader and a popular Science blogger with 1 million readers per month. His blog Nextbigfuture.com is ranked #1 Science News Blog. It covers many disruptive technology and trends including Space, Robotics, Artificial Intelligence, Medicine, Anti-aging Biotechnology, and Nanotechnology.
Known for identifying cutting edge technologies, he is currently a Co-Founder of a startup and fundraiser for high potential early-stage companies. He is the Head of Research for Allocations for deep technology investments and an Angel Investor at Space Angels.
A frequent speaker at corporations, he has been a TEDx speaker, a Singularity University speaker and guest at numerous interviews for radio and podcasts. He is open to public speaking and advising engagements.