OpenAI

OpenAI open sourcing new GPT-4 Turbo evals

Published

1 year ago

April 11, 2024

OpenAI today announced that it is open-sourcing a GitHub repository to run popular evals on various models including the new GPT-4 Turbo.

The company has improved writing, math, logical reasoning, and coding capabilities with the new GPT-4 Turbo. The model comes with responses that are more direct and less verbose. The responses will have more conversational language compared to the predecessor.

OpenAI GPT-4 Turbo (Image Credit: OpenAI)

The repository on Github contains a library of evaluating language models. These now include:

MMLU: Measuring Massive Multitask Language Understanding
MATH: Measuring Mathematical Problem Solving With the MATH Dataset
GPQA: A Graduate-Level Google-Proof Q&A Benchmark,
DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs
MGSM: Multilingual Grade School Math Benchmark (MGSM), Language Models are Multilingual Chain-of-Thought Reasoners
HumanEval: Evaluating Large Language Models Trained on Code
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Evals are sensitive to prompting and there’s a variation in the formulations used in recent publications and libraries. These approaches are carryovers from evaluating base models and from models that were worse at following instructions.

For example, when writing with ChatGPT, responses will be more direct, less verbose, and use more conversational language. pic.twitter.com/PHxrmCtpyl

— OpenAI (@OpenAI) April 12, 2024

Related Topics:ChatGPT Github News OpenAI

Up Next

OpenAI set up office in Japan and released Japanese language model

Don't Miss

You can now edit DALL-E images in ChatGPT

Sophia Garner

Sophia says technology is raising the bar of human living and she is actively trying to promote awareness among people about the latest changes in social media platforms. Social media has the power to make many positive impacts and she is continuously sharing the latest updates with fellow readers. In some spare time, she likes to tag along with friends for a walk.

EONMSK News

OpenAI open sourcing new GPT-4 Turbo evals

You may like