The Best Open Large Language Models

Huggingface has a table with ratings on the quality of large language models.

The 🤗 Open LLM Leaderboard aims to track, rank and evaluate LLMs and chatbots as they are released. They evaluate models on 4 key benchmarks from the Eleuther AI Language Model Evaluation Harness , a unified framework to test generative language models on a large number of different evaluation tasks. A key advantage of this leaderboard is that anyone from the community can submit a model for automated evaluation on the 🤗 GPU cluster, as long as it is a 🤗 Transformers model with weights on the Hub. They also support evaluation of models with delta-weights for non-commercial licensed models, such as LLaMa.

Evaluation is performed against 4 popular benchmarks:

AI2 Reasoning Challenge (25-shot) – a set of grade-school science questions.
HellaSwag (10-shot) – a test of commonsense inference, which is easy for humans (~95%) but challenging for SOTA models.
MMLU (5-shot) – a test to measure a text model’s multitask accuracy. The test covers 57 tasks including elementary mathematics, US history, computer science, law, and more.
TruthfulQA (0-shot) – a benchmark to measure whether a language model is truthful in generating answers to questions.

They chose these benchmarks as they test a variety of reasoning and general knowledge across a wide variety of fields in 0-shot and few-shot settings.

The leading open source models are llama-65b and MetaIX/GPT4-X-Alpasta-30b.

There is a completely open and free model from Mosaic. The model is MPT-7B.

MPT-7B, the latest entry in our MosaicML Foundation Series. MPT-7B is a transformer trained from scratch on 1T tokens of text and code. It is open source, available for commercial use, and matches the quality of LLaMA-7B. MPT-7B was trained on the MosaicML platform in 9.5 days with zero human intervention at a cost of ~$200k. Starting today, you can train, finetune, and deploy your own private MPT models, either starting from one of our checkpoints or training from scratch. For inspiration, we are also releasing three finetuned models in addition to the base MPT-7B: MPT-7B-Instruct, MPT-7B-Chat, and MPT-7B-StoryWriter-65k+, the last of which uses a context length of 65k tokens.

1 thought on “The Best Open Large Language Models”

  1. AI programs compete in the stock market. Yet hedge funds cannot beat the S&P average.

    I have dealt with bots. They are efficient, perfunctory, and use better English than the average droid. AI bots do not respond flexibly and creatively. AI bots will push middle managers and paper pushers into blue collar work where they compete with robots.

Comments are closed.