OpenAI Unveils Custom Chip as Google Shifts Gemini Strategy
The competitive landscape between OpenAI and Google shifted from algorithmic model metrics to structural infrastructure and software architecture this week. On June 24, OpenAI disclosed the development of Jalapeño, its first custom application-specific integrated circuit (ASIC) designed exclusively to handle model inference at a lower cost than standard commercial graphics processors. Google countered on the same day by integrating native “Computer Use” capabilities directly into its high-volume Gemini 3.5 Flash model, allowing the system to interact directly with web browsers, mobile interfaces, and traditional desktop applications.
However, Google also encountered a product delay, with internal reports confirming that the highly anticipated Gemini 3.5 Pro model has been pushed from a June release to July to allow for additional fine-tuning based on early tester feedback. Together, these moves show that the multi-billion-dollar race for artificial intelligence dominance is no longer just about public leaderboard victories. Instead, the focus has shifted to microchip diversification, reducing developer infrastructure overhead, and delivering autonomous software agents that can execute multi-step knowledge work.
Background: The Capital Expense War
For the past three years, the core battle between OpenAI and Google has focused on model scaling. Each organization spent billions of dollars training increasingly large language models (LLMs) on massive clusters of Nvidia GPUs. This approach established baseline capabilities in natural language understanding, reasoning, and multimodal data processing. However, it also created an expensive operational challenge: every prompt submitted by a user carries an infrastructure cost, driven by high specialized hardware prices and substantial power requirements.
As enterprise adoption expanded, the financial pressure shifted from training models to running them, a process known as inference. While training occurs once per model generation, inference occurs billions of times per day across applications, application programming interfaces (APIs), and enterprise integrations. Google holds a historic advantage in this area due to its Tensor Processing Unit (TPU) infrastructure, which has powered its internal search and cloud operations for a decade. OpenAI, conversely, remained entirely dependent on third-party commercial silicon, leaving its operational margins vulnerable to hardware supply constraints and fixed pricing structures.
Key Developments: OpenAI Shifts to Custom Silicon
OpenAI addressed this structural vulnerability directly on June 24 by announcing Jalapeño, its first proprietary inference processor. Developed over a dense nine-month cycle in collaboration with Broadcom, the chip bypasses the general-purpose flexibility of standard GPUs to focus entirely on the matrix multiplication operations required to serve LLM queries. The physical handoff occurred at OpenAI’s San Francisco headquarters, where Broadcom executives delivered the first 300mm silicon wafers manufactured on Taiwan Semiconductor Manufacturing Company’s (TSMC) advanced 3nm process node.

Architecturally, Jalapeño utilizes a systolic array design. This structural grid passes data rhythmically between individual processing cells, minimizing the power-heavy memory access cycles that slow down traditional chip architectures during high-volume inference tasks. By stripping out the components required for heavy model training or graphical rendering, OpenAI and Broadcom estimate that the chip can deliver model outputs at a significant reduction in power consumption and cost compared to current market alternatives. OpenAI’s internal design tools assisted in optimizing the silicon layout, representing a closed-loop development cycle where current models helped design the physical hardware intended to run their successors.
Google’s Software Counter: Native Computer Use
While OpenAI focused its weekly updates on the hardware layer, Google targeted immediate software deployment by upgrading its developer stack. On June 24, Google DeepMind pushed an update that makes “Computer Use” a native capability inside Gemini 3.5 Flash. Rather than relying on external wrapper applications or third-party integrations, the model can now directly interpret visual changes on a screen, move cursors, click buttons, type text, and navigate complex operating systems on its own.

In early benchmark evaluations on the OSWorld platform, Gemini 3.5 Flash achieved an efficiency rating of 78.4 percent, positioning its automated task execution capabilities close to larger models like GPT-5.5. The primary distinction is economic: Gemini 3.5 Flash is priced at approximately $1.50 per million input tokens and $9 per million output tokens, which is substantially lower than competitive high-tier models. This pricing structure makes the native deployment of parallel software agents financially viable for enterprise customers who automate repetitive digital tasks, such as cross-referencing databases, executing automated software testing, and managing multi-tiered customer service workflows.
To address the security risks of allowing an automated model to control desktop environments, Google deployed targeted adversarial training models alongside two secondary defense systems. These include automated detection systems designed to isolate prompt injection attacks and a mandatory human-in-the-loop confirmation requirement for critical actions, such as financial transactions or file deletions.
The Delay of Gemini 3.5 Pro
Google’s software rollout was balanced by a strategic delay in its mid-tier model lineup. Reports confirmed that Google has extended the development timeline for Gemini 3.5 Pro, pushing the public launch from late June into July. Google Chief Executive Sundar Pichai had previously indicated a June release during the company’s annual developer conference, but early feedback from enterprise previews on Google’s Antigravity platform and the public LMArena leaderboard prompted further refinement.
Sources familiar with the matter indicate that the extra development time is focused on optimizing performance for long-horizon tasks—complex operations requiring an agent to maintain logical consistency over hundreds of consecutive actions. Engineers are also adjusting how the model consumes tokens during long conversations, responding to developer complaints that previous versions depleted context allowances too quickly during sustained multi-turn sessions. The delay positions the Gemini 3.5 Pro release directly alongside upcoming competitive launches from Anthropic and OpenAI scheduled for mid-July.
Chronology of Weekly Advancements
The operational shifts observed over the current seven-day period reflect an acceleration in product delivery timelines across both corporate ecosystems.
Enterprise Integration Expansion
June 21, 2026
Samsung Electronics completes a corporate deployment, integrating customized instances of OpenAI’s ChatGPT and Codex models across its internal engineering and administrative divisions.
Parallel Strategic Launches
June 24, 2026
OpenAI publicly displays its proprietary Jalapeño inference chip developed with Broadcom. Simultaneously, Google integrates native desktop and browser control functions directly into Gemini 3.5 Flash.
Model Postponement
June 25, 2026
Enterprise tracking and internal platform reports confirm Google has shifted the broader release window for Gemini 3.5 Pro to July to improve agent logic and token consumption parameters.
Next-Generation System Card
June 26, 2026
OpenAI publishes the preliminary safety and system card documentation for its upcoming GPT-5.6 Sol model, signaling imminent developer preview availability.
Why It Matters: Inference Costs vs. Capability Bragging Rights
The tactical updates from both Google and OpenAI point to an underlying reality in the current market: raw intelligence benchmarks are yielding diminishing returns for enterprise buyers. While model providers frequently highlight fractional percentage gains on academic tests like GPQA (graduate-level science logic) or SWE-bench (real-world software engineering), corporate technology buyers are focusing on operational stability and total cost of ownership.
When an organization deploys hundreds of independent AI agents to monitor logs, process invoices, or interact with clients, the infrastructure cost compounds hourly. A provider that reduces inference costs by 50 percent through custom silicon or hyper-optimized small models presents a more compelling business case than a provider offering a slightly higher benchmark score at five times the operational cost. OpenAI’s investment in proprietary silicon is an explicit recognition that long-term market leadership requires control over hardware margins, matching Google’s structural vertical integration.
Industry Perspective: Multi-Vendor Enterprise Strategies
As competition intensifies, corporate purchasing behavior is evolving. Organizations are actively resisting long-term single-vendor commitments in their software stacks. Technology executives report that software contracts are being structured on shorter cycles to allow for rapid platform migration if a competitor delivers a more efficient model tier.
For example, enterprise buyers regularly run parallel API integrations, routing basic data processing tasks to low-cost options like Gemini 3.5 Flash while reserving complex logic puzzles for higher-tier models. This fluid allocation strategy forces both OpenAI and Google to defend their market share constantly across multiple fronts: cloud ecosystem compatibility, developer tool maturity, and pricing flexibility.
Market or Consumer Impact
For enterprise clients and everyday developers, this week’s updates translate into immediate practical advantages. The introduction of native computer control within lower-priced model tiers reduces the technical skill barrier required to build automated workflows. Small engineering teams can now deploy autonomous agents that interact with legacy enterprise systems without rebuilding API endpoints from scratch.
Concurrently, the diversification of the underlying silicon supply chain will likely stabilize API pricing. As OpenAI migrates its highest-volume workloads onto Jalapeño processors over the coming quarters, the reduced reliance on standard commercial accelerators should limit supply bottlenecks, ensuring more predictable latency and system availability during peak traffic periods.
Future Outlook
The third quarter of 2026 is positioned to be a highly competitive launch window for the artificial intelligence sector. With Google’s Gemini 3.5 Pro rescheduled for July, it will land alongside OpenAI’s upcoming GPT-5.6 Sol previews and Anthropic’s next-generation model updates.
This convergence will test whether specialized reasoning architectures or vertical hardware integration provides a greater market advantage. Furthermore, the persistent movement of specialized personnel between top-tier research firms underscores the fluid nature of core technical expertise, suggesting that structural execution and distribution infrastructure will be the primary differentiators of long-term commercial success.
Conclusion
The developments of this week demonstrate that the battle for artificial intelligence leadership has moved beyond corporate messaging. OpenAI’s hardware reveal and Google’s agent automation updates emphasize that efficiency, execution, and cost control are the new benchmarks of success. As both tech giants adjust their product portfolios and infrastructure foundations, the enterprise ecosystem stands to benefit from lower operational barriers, more capable automated tools, and a highly competitive corporate market.
10 FAQs
Q1: What is the primary purpose of OpenAI’s new Jalapeño chip?
A1: The Jalapeño chip is a custom ASIC designed specifically for model inference (running existing models to answer queries) rather than model training. Its architecture is optimized to reduce costs and power consumption for high-volume ChatGPT and API workloads.
Q2: Who manufactured OpenAI’s Jalapeño chip?
A2: The chip was designed in collaboration with Broadcom and is manufactured by Taiwan Semiconductor Manufacturing Company (TSMC) using their 3nm process node.
Q3: Why did Google delay the launch of Gemini 3.5 Pro?
A3: Google postponed the model launch from June to July 2026 to collect more feedback from early testers and improve its performance on long-horizon, multi-step tasks and agent-based workflows.
Q4: What is “Computer Use” in Gemini 3.5 Flash?
A4: It is a native capability built directly into the model that allows it to interact with digital environments by viewing screens, moving cursors, clicking elements, and typing across browsers and desktop applications.
Q5: How does Gemini 3.5 Flash perform on task execution benchmarks?
A5: The model scores 78.4 percent on the OSWorld Verified Benchmark, placing its performance close to more expensive models like GPT-5.5 while operating at a lower price point.
Q6: What are the input and output token costs for Gemini 3.5 Flash?
A6: Gemini 3.5 Flash costs approximately $1.50 per million input tokens and $9 per million output tokens, offering a significant financial advantage for high-volume agent deployments.
Q7: How is Google addressing the security risks of native Computer Use?
A7: Google utilizes targeted adversarial training to mitigate prompt injection vulnerabilities, alongside automated security checks and recommended human-in-the-loop validation for critical actions.
Q8: What architectural design does the Jalapeño chip use?
A8: It uses a systolic array design, which passes data through a rhythmic grid of processing units to optimize dense matrix multiplications while reducing heavy memory access cycles.
Q9: Why are enterprise buyers moving away from single-vendor AI contracts?
A9: Corporate buyers are choosing short-term, multi-vendor strategies to avoid ecosystem lock-in, allowing them to route tasks dynamically to whichever provider offers the best cost-to-performance ratio at any given time.
Q10: What competitive launches are expected in July 2026?
A10: July is expected to see the rescheduled launch of Google’s Gemini 3.5 Pro alongside developer previews of OpenAI’s GPT-5.6 Sol and Anthropic’s Claude Opus 4.7.




