OpenAI intros two open-weight language models that can run on consumer GPUs — optimized to run on devices with just 16GB of memory

Date:

Share post:

OpenAI has developed a pair of new open-weight language models optimized for consumer GPUs. In a blog post, OpenAI announced “gpt-oss-120b” and “gpt-oss-20b”, the former designed to run on a single 80GB GPU and the latter optimized to run on edge devices with just 16GB of memory.

Both models take advantage of a Transformer using the mixture-of-experts model, a model that was popularized with DeepSeek R1. Despite gpt-oss-120b and 20b’s design focus towards consumer GPUs, both support up to 131,072 context lengths, the longest available for local inference. gpt-oss-120b activates 5.1 billion parameters per token, and gpt-oss-20b activates 3.6 billion parameters per token. Both models use alternating dense and locally banded sparse attention patterns and use grouped multi-query attention with a group size of 8.

Source link

spot_img

Related articles

Best Scanners for Offices and Home Use

Choosing the right scanner for home and office use is very important. There are many types of scanners...

Cost, Talent, and Long-Term ROI Breakdown

When it comes to choosing a PHP framework for your next web application, the debate between Symfony and...