openai/gpt-oss-20b

Name: openai/gpt-oss-20b
Author: OpenAI

Run locally on Apple devices with Mirai

Type: local
From: OpenAI
Quantization: No
Parameters: 20B
Size: 39.0 GB
Source

GPT-OSS-20B logo

GPT-OSS-20B is OpenAI's smaller open-weight model, part of the GPT-OSS series designed for powerful reasoning, agentic tasks, and versatile developer use cases. Released under the Apache 2.0 license, it offers 21 billion total parameters with 3.6 billion active parameters via a Mixture-of-Experts (MoE) architecture — optimized for lower latency, local deployment, and specialized workloads.

Architecture & Efficiency

The model ships with MXFP4 quantization of MoE weights baked in during post-training, enabling it to run within just 16GB of memory — making it accessible on consumer-grade hardware. All published evaluations were conducted with this same quantization, so what you benchmark is what you deploy.

Key Capabilities

Configurable reasoning effort: Adjust reasoning depth (low, medium, high) via system prompt to balance quality and latency per task.
Full chain-of-thought access: Inspect the model's complete reasoning trace for debugging and transparency.
Agentic tool use: Native support for function calling, web browsing, Python code execution, and Structured Outputs.
Fine-tunable on consumer hardware: Customize the model for specialized domains without enterprise-scale infrastructure.

Harmony Response Format

GPT-OSS-20B was trained on OpenAI's Harmony response format and must be used with it for correct behavior. Popular frameworks like Transformers apply this automatically via their chat template.

Deployment Options

The model is supported across a broad ecosystem including Transformers, vLLM, Ollama, LM Studio, and reference PyTorch/Triton implementations. It can be served as an OpenAI-compatible API endpoint or run interactively on local machines.

Who It's For

GPT-OSS-20B is the right choice for developers seeking a capable reasoning model that fits into resource-constrained environments — edge deployments, rapid prototyping, or latency-sensitive applications — while retaining strong agentic and tool-use performance. For heavier production workloads, its sibling GPT-OSS-120B offers additional capacity.

Explore all local models

Choose framework

Run the following command to install Mirai SDK

spm https://github.com/trymirai/uzu.git

Apply code

1import Uzu23public func runChat() async throws {4    let engineConfig = EngineConfig.create()5    let engine = try await Engine.create(config: engineConfig)67    guard let model = try await engine.model(identifier: "openai/gpt-oss-20b") else {8        return9    }10    for try await update in try await engine.download(model: model).iterator() {11        print("Download progress: \(update.progress())")12    }1314    let messages = [15        ChatMessage.system().withText(text: "You are a helpful assistant"),16        ChatMessage.user().withText(text: "Tell me a short, funny story about a robot")17    ]18    let session = try await engine.chat(model: model, config: .create())19    let stream = await session.replyWithStream(input: messages, config: .create())20    var message: ChatMessage? = nil21    for try await update in stream.iterator() {22        switch update {23        case .replies(let replies):24            message = replies.last?.message25        case .error(let error):26            print("Error: \(error)")27        }28    }29    print("Text: \(message?.text() ?? "empty")")30}

1Choose framework

2Run the following command to install Mirai SDK

spm https://github.com/trymirai/uzu.git

3Apply code

1	import Uzu
2
3	public func runChat() async throws {
4	let engineConfig = EngineConfig.create()
5	let engine = try await Engine.create(config: engineConfig)
6
7	guard let model = try await engine.model(identifier: "openai/gpt-oss-20b") else {
8	return
9	}
10	for try await update in try await engine.download(model: model).iterator() {
11	print("Download progress: \(update.progress())")
12	}
13
14	let messages = [
15	ChatMessage.system().withText(text: "You are a helpful assistant"),
16	ChatMessage.user().withText(text: "Tell me a short, funny story about a robot")
17	]
18	let session = try await engine.chat(model: model, config: .create())
19	let stream = await session.replyWithStream(input: messages, config: .create())
20	var message: ChatMessage? = nil
21	for try await update in stream.iterator() {
22	switch update {
23	case .replies(let replies):
24	message = replies.last?.message
25	case .error(let error):
26	print("Error: \(error)")
27	}
28	}
29	print("Text: \(message?.text() ?? "empty")")
30	}