Anthropic CEO believes the system will get a lot better in 2024 but they will not bend reality until 2025 or 2026.
He believes the interpretability, steerability and reliability. They want to tame these wild models.
The systems are statistical and opaque.
Mechanics interpretability is a focus of Anthropic. This is like taking a mental X-ray of the LLM. They are able to separate multiple concepts in the systems. He is getting more confident that it will have commercial value and will be largely successful. This help give more confidence in the models.
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Abstract
In Anthropic’s latest paper, Towards Monosemanticity: Decomposing Language Models With Dictionary Learning, we outline evidence that there are better units of analysis than individual neurons, and we have built machinery that lets us find these units in small transformer models. These units, called features, correspond to patterns (linear combinations) of neuron activations. This provides a path to breaking down complex neural networks into parts we can understand, and builds on previous efforts to interpret high-dimensional systems in neuroscience, machine learning, and statistics. In a transformer language model, we decompose a layer with 512 neurons into more than 4000 features which separately represent things like DNA sequences, legal language, HTTP requests, Hebrew text, nutrition statements, and much, much more. Most of these model properties are invisible when looking at the activations of individual neurons in isolation.
Anthropic's Interpretability Team has been making amazing progress building a mechanistic understanding of how transformers work! https://t.co/rZl4T8eBLo
— Kamal Ndousse (@kandouss) March 8, 2022
I have a lot of respect for the work of Anthropic on interpretability, in evaluations/red-teaming and for not having released Claude for 9 months.
But for the sake of transparency, I want to publicize that when I see:
1) @sama burning his social & political capital on ideas… https://t.co/nIPSiKbxGm
— Siméon (@Simeon_Cps) June 26, 2023
Superposition doesn't occur in linear models. It only occurs if one introduces a non-linearity that can filter out noise from superposition "interference". It also requires sparsity: as features become sparser, there is more superposition. pic.twitter.com/eg2oEM59I6
— Anthropic (@AnthropicAI) September 14, 2022
Our new model Claude 2.1 offers an industry-leading 200K token context window, a 2x decrease in hallucination rates, system prompts, tool use, and updated pricing.
Claude 2.1 is available over API in our Console, and is powering our https://t.co/uLbS2JNczH chat experience. pic.twitter.com/T1XdQreluH
— Anthropic (@AnthropicAI) November 21, 2023
This week saw major developments in AI policy: the US issued an Executive Order, the G7 adopted an International Code of Conduct, and the UK hosted a landmark summit on AI safety, producing the Bletchley Declaration.
— Anthropic (@AnthropicAI) November 5, 2023
We summarized each of these events and what we believe they mean for the industry: https://t.co/pQLqDb7LvY
— Anthropic (@AnthropicAI) November 5, 2023
Used
@AnthropicAI
own Claude to summarize their findings:
“The US, G7 countries, and UK recently made big moves on AI policy.
The US issued an executive order to manage AI risks and benefits. It creates new AI leadership roles and pushes agencies to use AI ethically. It also starts a pilot program to give researchers access to data and computing for AI safety research.
The G7 countries agreed on responsible AI principles for companies developing advanced AI. This sets a good baseline for best practices globally.
The UK hosted a summit on AI safety with many countries. They issued a declaration calling for cooperation on AI’s benefits and risks. They will also assess current research to guide new AI safety research priorities.
The UK and US announced the first AI safety institutes to evaluate frontier AI risks. Independent testing is important for sensible AI regulation.
These show governments taking AI safety seriously. Evaluation methods are key so policy isn’t shooting blindly. We’re encouraged and will contribute our part for safe, beneficial AI.”
Brian Wang is a Futurist Thought Leader and a popular Science blogger with 1 million readers per month. His blog Nextbigfuture.com is ranked #1 Science News Blog. It covers many disruptive technology and trends including Space, Robotics, Artificial Intelligence, Medicine, Anti-aging Biotechnology, and Nanotechnology.
Known for identifying cutting edge technologies, he is currently a Co-Founder of a startup and fundraiser for high potential early-stage companies. He is the Head of Research for Allocations for deep technology investments and an Angel Investor at Space Angels.
A frequent speaker at corporations, he has been a TEDx speaker, a Singularity University speaker and guest at numerous interviews for radio and podcasts. He is open to public speaking and advising engagements.