OpenAI's First Custom Chip Cuts Inference Costs 50%

OpenAI just delivered on something it's been quietly building toward for eight months: its own silicon. On Wednesday, OpenAI and Broadcom unveiled Jalapeño, OpenAI's first custom AI accelerator, and according to Broadcom CEO Hock Tan, early testing shows roughly 50% cost savings compared with typical AI GPUs running the same inference workloads.

The chip was delivered in person, Broadcom's Hock Tan and Charlie Kawwas handed a physical sample to Sam Altman and Greg Brockman on Wednesday, eight months after the two companies first announced a deal to co-develop 10 gigawatts of custom AI accelerators. What makes the timeline notable: Jalapeño went from initial design to manufacturing tape-out in roughly nine months, which both companies describe as the fastest ASIC development cycle ever achieved in high-performance semiconductors.

Jalapeño is built specifically for inference, running already-trained AI models in response to user requests, rather than the training process that builds those models in the first place. That's a deliberate, narrow target. OpenAI emphasized the chip's low operating cost specifically when running real-time coding models, and pre-training will likely keep relying on Nvidia hardware for now. The split matters because inference, not training, is where OpenAI actually spends money every single day serving its products to users, and unlike training cost, inference cost scales directly with usage.

"We have a deep understanding of the workload. We've really been looking for specific workloads that are underserved, and asking how can we build something that will be able to accelerate what's possible." - Greg Brockman, President, OpenAI

What this means: every additional ChatGPT or Codex query currently costs OpenAI money to serve on rented or purchased GPU capacity. If Jalapeño actually delivers anywhere close to the 50% cost reduction Broadcom is claiming, OpenAI's largest recurring expense gets meaningfully cheaper, and that's before accounting for the leverage of not depending entirely on Nvidia for every chip in its stack.

The business stakes extend well beyond OpenAI. Broadcom has reportedly asked Microsoft to guarantee it will purchase 40% of the chips to secure the first production phase, tying Microsoft's own AI infrastructure plans directly to Jalapeño's success. Hock Tan told CNBC that demand from Broadcom's six largest customers is "simply insatiable," extending years beyond 2026 into elevated demand through 2028. Initial deployment is targeted for late 2026 at gigawatt scale, expanding across multiple chip generations after that.

The numbers being shared right now come with a real caveat. The performance-per-watt figures are self-reported by OpenAI and haven't been independently finalized, it's not yet public which competing chips Jalapeño was benchmarked against, on what specific tasks, or under what conditions. A detailed technical report is promised in the coming months, which will be the actual test of whether the 50% cost claim holds up outside a press release.

What happens next matters more than the announcement itself. Engineering samples are already running real workloads in the lab, including OpenAI's GPT-5.3-Codex-Spark model, but "engineering sample" and "shipped at gigawatt scale" are very different milestones. The real signal to watch for is whether Microsoft actually commits to that 40% purchase guarantee Broadcom is asking for, if the company that runs the largest share of OpenAI's existing infrastructure backs Jalapeño at volume, that's the strongest evidence yet that custom silicon, not just rented Nvidia capacity, is where the AI infrastructure race is actually heading.

OpenAI's first custom chip cuts inference costs by roughly 50%

Read More

Slate's cheap EV truck got better the moment Washington stopped caring where its battery came from

Paradromics put a 420-electrode brain chip smaller than a dime into a human and pulled it out in 20 minutes. Now it wants to restore speech.

Americans blocked $98 billion in AI data center projects and they're just getting started

US government forces Anthropic to disable its most advanced AI models worldwide over minor jailbreak fears

Most Popular

OpenAI's first custom chip cuts inference costs by roughly 50%

Slate's cheap EV truck got better the moment Washington stopped caring where its battery came from

Figma just gave designers AI shaders and a timeline while its stock sits 79% below its IPO

Slate's $24,950 electric truck just became the cheapest new vehicle in America

Oracle's own filing reveals it cut 21,000 jobs and spent $1.84B doing it