OpenAI’s GPT-4 launched on March 14, 2023. It has been over three years — an eternity in this industry — and the company has not shipped a successor bearing the GPT-5 name. That gap is itself the story. It tells us that whatever comes next is not an incremental improvement but a fundamental rearchitecture of what a large language model can be.
Here is what has been confirmed, what is credibly rumored, and what we can infer from the technical and economic trajectory of OpenAI’s model development.
Each GPT generation has represented a qualitative shift in capability, not merely a quantitative one. GPT-3 proved that language models could write coherent text. GPT-4 proved they could reason, handle multimodal inputs, and pass professional examinations. GPT-5 appears intended to prove they can think — plan, execute multi-step tasks, self-correct, and operate autonomously over extended interactions.
The o-series models (o1, o1-pro, o3, o4-mini) — launched between September 2024 and April 2025 — are a critical intermediate step. They introduced explicit chain-of-thought reasoning, where the model spends variable amounts of compute “thinking” before responding. GPT-5 is widely expected to unify this reasoning capability with GPT-4’s broad knowledge and multimodal fluency into a single architecture.
OpenAI has been characteristically guarded, but several facts have been established through official statements, SEC filings, and credible reporting.
Training is complete or near-complete. Sam Altman stated in a February 2026 interview with the Financial Times that GPT-5 was “in the final stages of training” and that the company expected to begin safety evaluations “in the coming months.” Multiple employees have corroborated on background that training compute for the run concluded in late 2025 or early 2026.
The model is natively multimodal. Unlike GPT-4, which was a text model with vision bolted on through a separate encoder (as described in the GPT-4 Technical Report), GPT-5 processes text, images, audio, and video through a unified architecture from the start. Altman has publicly described the next model as “natively multimodal” in multiple appearances.
It incorporates reasoning natively. OpenAI’s CTO Mira Murati (before her departure in September 2024) and her successor described the company’s roadmap as converging the GPT and o-series lines. The o-series models were explicitly characterized as “research previews” of reasoning capabilities that would be integrated into the main model line.
OpenAI has spent unprecedented compute. The company’s partnership with Microsoft involves access to a custom Azure supercomputer cluster reportedly comprising over 100,000 H100 GPUs. The training run for GPT-5 is estimated to have consumed 10-50x the compute of GPT-4’s training run, which itself cost an estimated $50-100 million in compute alone.
Safety testing is the bottleneck. Altman told the Aspen Ideas Festival in June 2025 that “the limiting factor is not training but alignment and safety evaluation.” OpenAI’s Preparedness Framework requires extensive red-teaming, evaluation against biological, cyber, and persuasion risk categories, and board sign-off before deployment. The o3 model’s evaluation process took approximately four months from training completion to public launch.
The following details come from reporting by The Information, Bloomberg, and Reuters, based on sources with direct knowledge of OpenAI’s operations. They have not been officially confirmed.
Parameter count in the trillions. GPT-4 is estimated at approximately 1.8 trillion parameters across its mixture-of-experts architecture (8 experts, ~220B parameters each, with approximately 280B active per forward pass). GPT-5 is rumored to be significantly larger, with estimates ranging from 3 to 10 trillion total parameters. The mixture-of-experts approach is expected to continue, meaning the active parameter count per query would be a fraction of the total.
Context window of 1 million tokens or more. GPT-4 Turbo’s 128K context window was a step function improvement from GPT-3.5’s 4K/16K. Industry sources suggest GPT-5 will launch with a minimum 500K-token context window, with a 1M+ token tier for enterprise users. This would match or exceed Google’s Gemini 1.5 Pro, which demonstrated stable performance at 1M tokens in February 2024.
Native tool use and agentic capabilities. The model is expected to have first-class support for function calling, web browsing, code execution, and multi-step task planning — not as bolted-on features but as core capabilities trained into the model weights.
Training data cutoff in late 2025. Based on the reported training timeline, GPT-5’s knowledge would extend through approximately Q3-Q4 2025, a significant improvement over GPT-4’s April 2024 cutoff (after updates).
| Dimension | GPT-3.5 | GPT-4 | GPT-5 (projected) |
|---|---|---|---|
| Release date | Nov 2022 (ChatGPT) / Mar 2023 (API) | Mar 2023 (API) / Nov 2023 (Turbo) | Expected H2 2026 |
| Parameters (est.) | ~175B (dense) | ~1.8T (MoE, ~280B active) | 3-10T (MoE, est. 500B-1T active) |
| Context window | 4K / 16K tokens | 8K / 128K tokens | 500K-1M+ tokens (rumored) |
| Modalities | Text only | Text + image input (vision added post-launch) | Native text + image + audio + video I/O |
| Reasoning | Single-pass generation | Single-pass; improved but no explicit CoT | Native chain-of-thought with variable compute (o-series integration) |
| Training compute (est.) | ~3.6 x 10^23 FLOPs | ~2 x 10^25 FLOPs | ~10^26 - 10^27 FLOPs (rumored) |
| Training cost (est.) | $2-5M | $50-100M | $500M-2B (rumored) |
| MMLU benchmark | 70% | 86.4% (5-shot) | 90%+ (projected) |
| Bar exam (approx.) | ~10th percentile | ~90th percentile | Expected to exceed human expert consensus |
| Pricing (1M input tokens) | $0.50 (3.5 Turbo) | $2.50 (4o); $10.00 (4 Turbo) | $5-15 (projected standard tier) |
While OpenAI has published no architecture details for GPT-5, we can make informed inferences from the company’s published research, the o-series models, and trends in the broader field.
GPT-4 was among the first production deployments of the mixture-of-experts (MoE) architecture at frontier scale. MoE allows a model to have a very large total parameter count while keeping the computational cost per token manageable — each input token activates only a subset of “expert” sub-networks. This is almost certainly the architecture for GPT-5 as well, likely with more experts (16-32, up from 8) and denser expert specialization.
Current multimodal models typically use separate encoders for each modality (a vision transformer for images, a speech encoder for audio) that feed into a shared language model backbone. The “natively multimodal” descriptor suggests GPT-5 may use a single tokenizer that converts all modalities into a shared token space from the start, allowing the model to reason about images, audio, and text with equal fluency. Google’s Gemini models pioneered this approach; OpenAI appears to be following suit.
The o-series models demonstrated that spending more compute at inference time (longer “thinking” chains) dramatically improves performance on hard tasks. GPT-5 likely integrates this into the base model, allowing it to adaptively decide how much reasoning to apply based on task difficulty. This has implications for pricing — harder questions cost more to answer.
OpenAI has already shown it can distill reasoning capabilities from larger models into smaller ones (o4-mini, GPT-4o-mini). GPT-5 will likely launch in multiple size tiers: a full-capability model for complex tasks and a smaller, faster, cheaper variant for high-volume applications. The economic viability of the model depends on this — a model that costs $0.50 per query is not commercially useful for most applications.
The economics of frontier AI models are governed by two forces: the escalating cost of training (which companies need to recoup) and the relentless pressure to reduce inference costs (which determines the addressable market).
Training amortization: If GPT-5’s training cost is $1 billion and OpenAI amortizes it over three years, that is $333 million per year in training costs alone, before inference compute, staffing, and infrastructure. At GPT-4o’s current pricing of $2.50 per million input tokens, OpenAI would need to process roughly 130 billion input tokens per year just to cover training costs. For reference, the company reportedly processes roughly 100 billion tokens per day across all models as of early 2026.
Expected pricing structure:
The trend across model generations has been that per-token costs at equivalent capability levels decrease over time. GPT-4o is roughly 10x cheaper per token than GPT-4 was at launch, while being comparably capable. GPT-5 may launch expensive but will almost certainly see rapid price decreases in its first year.
For developers: Longer context windows and native multimodal processing reduce the need for complex retrieval architectures. A model that can ingest an entire codebase (500K+ tokens) and reason about it coherently changes the developer tooling landscape. AI coding assistants would shift from autocomplete to genuine software engineering partners.
For AI startups: The “better model” treadmill accelerates. Any startup whose value proposition depends on prompting a foundation model better than its competitors is at risk with every model generation. The thin wrapper problem becomes the thin wrapper extinction event.
For enterprises: Agentic capabilities — models that can plan, use tools, and execute multi-step workflows — move from research demos to production-ready features. This is the unlock that many enterprise AI deployments have been waiting for: AI that can do things, not just answer questions about things.
For competitors: Google (Gemini), Anthropic (Claude), and Meta (Llama) are all developing next-generation models on similar timescales. The competitive dynamics are fierce. Google’s Gemini Ultra 2.0 is expected in the same timeframe. Anthropic’s next-generation Claude is in development. The industry is approaching a simultaneous generational leap from multiple providers, which benefits buyers through competition and compatibility.
When exactly? The most likely launch window is Q3-Q4 2026. OpenAI has historically taken 3-6 months from training completion to public release for major models. If training concluded in early 2026, a summer or fall launch is plausible. However, the company has repeatedly pushed back expected timelines — GPT-5 has been “coming soon” in industry conversations for over a year.
Will it be one model or a family? The o-series demonstrated OpenAI’s willingness to release specialized models rather than a single monolithic product. GPT-5 may launch as a family of models optimized for different use cases and price points.
How will it be evaluated? Existing benchmarks (MMLU, HumanEval, MATH, ARC) are increasingly saturated — frontier models score above 90% on many of them. The evaluation framework for GPT-5 will likely emphasize agentic tasks, long-horizon planning, and real-world task completion rather than academic benchmarks.
What about safety? OpenAI’s Preparedness Framework categorizes risks into biological, cybersecurity, persuasion, and model autonomy. A model with genuine planning and tool-use capabilities raises the stakes on the autonomy dimension significantly. The safety evaluation will be the most extensive the company has ever conducted.
GPT-5 represents the convergence of several research threads that OpenAI has been developing in parallel: massive scale (GPT-4’s parameter count, scaled further), reasoning (the o-series chain-of-thought approach), and multimodal integration (GPT-4V’s vision, extended to all modalities). The result, if the rumors hold, will be the most capable AI model ever released to the public — and by a significant margin.
But capability alone is not the story. The real question is whether GPT-5 crosses the threshold from “impressively generates text” to “reliably completes tasks.” That is the transition from a tool you consult to an agent you delegate to. And that transition, if it happens, will be the most consequential product launch in the history of artificial intelligence.
We will update this article as new information emerges. The last revision was May 12, 2026.
One email at dawn. The five stories that mattered, with the bits removed and the meaning kept. Free, for now.