OpenAI is scaling up language models and this greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art finetuning approaches. They trained GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 can translate, answer questions, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation. It can unscramble words, using a novel word in a sentence, or performing 3-digit arithmetic. They also identify some datasets where GPT-3’s few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora.
GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. Generated news articles that are around 500 words long are difficult to distinguish from human-written news articles.
GPT-3 used ten times as much training data as GPT-2.
It was 65% accurate on SAT analogy questions.
SOURCES- Open AI, Arxiv paper
Written by Brian Wang, Nextbigfuture.com