More Understanding of Inner Workings of LLM and Gamechanging AI Will Be 2025 or a Bit Later

Anthropic CEO believes the system will get a lot better in 2024 but they will not bend reality until 2025 or 2026.

He believes the interpretability, steerability and reliability. They want to tame these wild models.

The systems are statistical and opaque.

Mechanics interpretability is a focus of Anthropic. This is like taking a mental X-ray of the LLM. They are able to separate multiple concepts in the systems. He is getting more confident that it will have commercial value and will be largely successful. This help give more confidence in the models.

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

Abstract
In Anthropic’s latest paper, Towards Monosemanticity: Decomposing Language Models With Dictionary Learning, we outline evidence that there are better units of analysis than individual neurons, and we have built machinery that lets us find these units in small transformer models. These units, called features, correspond to patterns (linear combinations) of neuron activations. This provides a path to breaking down complex neural networks into parts we can understand, and builds on previous efforts to interpret high-dimensional systems in neuroscience, machine learning, and statistics. In a transformer language model, we decompose a layer with 512 neurons into more than 4000 features which separately represent things like DNA sequences, legal language, HTTP requests, Hebrew text, nutrition statements, and much, much more. Most of these model properties are invisible when looking at the activations of individual neurons in isolation.

Used
@AnthropicAI
own Claude to summarize their findings:

“The US, G7 countries, and UK recently made big moves on AI policy.

The US issued an executive order to manage AI risks and benefits. It creates new AI leadership roles and pushes agencies to use AI ethically. It also starts a pilot program to give researchers access to data and computing for AI safety research.

The G7 countries agreed on responsible AI principles for companies developing advanced AI. This sets a good baseline for best practices globally.

The UK hosted a summit on AI safety with many countries. They issued a declaration calling for cooperation on AI’s benefits and risks. They will also assess current research to guide new AI safety research priorities.

The UK and US announced the first AI safety institutes to evaluate frontier AI risks. Independent testing is important for sensible AI regulation.

These show governments taking AI safety seriously. Evaluation methods are key so policy isn’t shooting blindly. We’re encouraged and will contribute our part for safe, beneficial AI.”