Emergent Capabilities of LLM as They Get Bigger

In Emergent abilities of large language models, emergent ability are defined as an ability that is “not present in small models but is present in large models.” Is emergence a rare phenomena, or are many tasks actually emergent?

It turns out that there are more than 100 examples of emergent abilities that already been empirically discovered by scaling language models such as GPT-3, Chinchilla, and PaLM. To facilitate further research on emergence, Jason Wei has compiled a list of emergent abilities.

Alan Thompson simplified Jason Wei’s list of emergent LLM capabilities.

Emergent few-shot prompted tasks

First, emergent few-shot prompted tasks have performance at random chance for small models and well above-random for large models. By far the largest sources for these emergent tasks were BIG-Bench and the Massive Multitask Benchmark, with 67 and 51 emergent tasks respectively.

MMLU (51 tasks; see the Chinchilla paper for results):

Chinchilla 7B (7 tasks): Professional Medicine, High School Statistics, High School Macroeconomics, High School Psychology, Anatomy, High School Government And Politics, High School Microeconomics
Chinchilla 70B (44 tasks): International Law, Human Aging, Sociology, Us Foreign Policy, High School World History, Marketing, Logical Fallacies, Miscellaneous, College Biology, High School Us History, Security Studies, High School European History, High School Geography, Computer Security, Human Sexuality, Astronomy, Prehistory, Philosophy, Jurisprudence, Management, Moral Disputes, High School Biology, Professional Psychology, World Religions, Nutrition, Clinical Knowledge, Business Ethics, Medical Genetics, High School Computer Science, Public Relations, College Medicine, Conceptual Physics, Electrical Engineering, High School Chemistry, Machine Learning, Professional Accounting, Professional Law, Virology, Econometrics, College Physics, Elementary Mathematics, Moral Scenarios, Formal Logic, High School Physics

Future research directions, beyond simply scaling up.

Can we improve model architectures? E.g., sparsity, external memory, better objectives

Can we improve data quality and quantity? Training for longer increases pre-training compute but not inference compute

Better prompting. How can we extract the most performance out of an existing language model?

Frontier tasks. What tasks are language models currently not able to perform, that we should evaluate on future language models of better quality?

Why do emergent abilities occur, and can we predict them? E.g., do language models learning compositional abilities that enable them to solve harder problems?

  1. so Tic Tac Toe is an emergent behaviour of ChatGPT. Surprising at first that it can play tic tac toe? it will even draw the board , take turns, as for your next move. how does that emerge from a LLM?

    I was impressive for a moment, but then after a couple of turns it starts making mistakes – overwriting moves, saying the game is over before it is, etc… So it turns out to be absolute garbage in the end.

    I’ve noticed similar for other questions I ask it. Very impressive at first, until you dig into the details carefully and see that it spews so many mistakes and fabrications to be almost useful in any domain where a specific answer is needed. I see more use in fuzzier disciplines like creative writing, where a correct answer is not required.

  2. When it comes to artilects, it’s a fair bet that major governments will have the best. And they will spend a lot of their efforts countering those of other major governments. Unless it is a “winner-takes-all” kind of thing and the first to get a really good one then seizes control of the world economy and kneecaps any and all competition.

    As far as limiting its use against the general population of their own countries? You can’t. So don’t emigrate to some place like China. And some people will claim the US is no better, which is just silly. Also, the US can’t do something like this in the open, nor can it keep big secrets very long, and that’s probably your best protection.

  3. I am concerned about this technology being used by immoral people with power. What is being done to limit its use by criminals and government beaurocrats against the general populations of their countries?

