What is the main bottleneck in AI infrastructure in 2026?

The immediate bottleneck is advanced silicon, especially leading-node logic wafers and HBM memory. Power remains the deeper structural constraint because even available chips cannot generate revenue without deliverable electricity, cooling, and interconnection.

Why does a gigawatt matter in AI data centers?

A gigawatt is a useful unit for measuring the full economics of AI infrastructure. It bundles land, substations, buildings, cooling, accelerators, memory, networking, power, operations, financing, utilization, and customer revenue into one comparable scale.

Are neocloud backlogs proof that AI infrastructure returns will be strong?

No. Backlog validates customer demand, but it does not by itself prove return on capital. Operators still need to convert contracts into delivered capacity, high utilization, gross profit after power, and earnings after depreciation and interest.

Can token prices fall while AI data centers remain valuable?

Yes. If newer systems produce more useful tokens per watt and software improves routing, caching, batching, and model efficiency, revenue per gigawatt can rise even as price per token falls.

What should operators watch in the AI compute buildout?

Operators should watch HBM4 qualification, TSMC allocation, power interconnection timelines, onsite generation lead times, financing spreads, fleet utilization, and whether paid inference revenue grows faster than subsidized usage.

The AI Compute Buildout: Players, Dependencies, and Where We Are

The AI infrastructure debate is usually framed as a simple fight between two stories.

The bull story says demand is real, compute is scarce, and every gigawatt of deployed AI capacity is a factory for intelligence. If the fleet is utilized, the economics can look like cloud infrastructure with an inference upside on top.

The bear story says the numbers are too large, the financing is too circular, GPU depreciation is too generous, and investors are mistaking backlog for profit.

Both stories contain truth. That is what makes the moment hard to read.

The better way to understand the buildout is as a dependency chain. AI infrastructure is not one market. It is a stack of coupled constraints: advanced logic wafers, HBM and DRAM, packaging, networking, power, cooling, construction, permitting, financing, utilization, and model demand. The binding constraint keeps moving.

SemiAnalysis has been especially useful on this point. In the post-ChatGPT era, the bottleneck moved from CoWoS packaging, to data center power, and now into advanced silicon itself: logic wafers and memory. Power is still the structural long-pole, but in 2026 the immediate question is often more basic: can the industry get enough leading-node wafers and memory bits to turn demand into deployed systems?

That framing changes the question. It is not "are there enough GPUs?" It is:

Which layer limits deployable compute right now?
Who controls that layer?
How quickly can capacity be added?
Who takes the financial risk while waiting?
And when the compute arrives, is it actually utilized at a price that covers depreciation, power, and capital cost?

The Scale Has Become Industrial

Public estimates for 2026 hyperscaler capital spending have moved above $700 billion across Amazon, Microsoft, Alphabet, and Meta, with some trackers putting the broader group higher once Oracle and other AI infrastructure buyers are included. The exact number changes with each earnings call, but the direction is clear: this is no longer a software budget. It is an industrial buildout.

That matters because industrial buildouts fail differently from software bets. The hard parts are not just product-market fit or model quality. The hard parts are:

Can TSMC allocate enough leading-edge wafers?
Can SK Hynix, Samsung, and Micron supply enough HBM and server DRAM?
Can power be interconnected before the servers age?
Can transformers, switchgear, UPS systems, chillers, liquid-cooling equipment, and turbines arrive on time?
Can private credit, banks, bond markets, and balance sheets keep funding the gap?
Can customers use enough of the capacity to turn contracted demand into cash?

The AI buildout has become a capital allocation problem with a software revenue model attached.

The Gigawatt Economics

A useful mental unit is the gigawatt AI data center.

A gigawatt is not just a power number. It is a way to compress the entire stack into one unit of economic analysis: land, substations, generators, buildings, cooling, racks, GPUs or ASICs, networking, memory, software, operations, financing, and customer contracts.

Analyst estimates vary, but the rough public debate now places a gigawatt-scale AI facility in the tens of billions of dollars. The major buckets are:

Cost layer	What it includes	Why it matters
Site and facility	Land, shell, substations, electrical rooms, mechanical systems, security	Slow to permit and build; power access determines whether the site is useful
Compute systems	GPUs or ASICs, host CPUs, HBM, server DRAM, racks	The largest cost bucket and the fastest-depreciating asset
Networking	Scale-up links, Ethernet or InfiniBand, switches, optics, cables	Determines whether thousands of chips behave like one useful cluster
Power and cooling	Grid interconnection, onsite gas, backup generators, UPS, liquid cooling	Often determines deployment timing more than model demand does
Operations and financing	Electricity, maintenance, staff, debt service, leases	The difference between adjusted EBITDA and real economic profit

If a gigawatt-scale fleet is rented as infrastructure, the economic variables are simple in theory: number of accelerators, utilization, price per accelerator-hour, power cost, depreciation schedule, and financing cost.

In practice, every one of those variables is contested. Utilization is rarely disclosed cleanly. GPU life is debated. Power arrival can slip. Customers may have long-term contracts, but those contracts still have to convert into delivered capacity and paid revenue.

This is why backlog is useful but incomplete. Backlog says customers want the compute. It does not prove the operator can build the site on time, keep the fleet full, finance the capex efficiently, and earn an acceptable return after depreciation and interest.

IaaS Is Not The Same As Inference

There are two different monetization engines sitting on top of the same hardware.

The first is infrastructure-as-a-service: rent the GPU or accelerator hours. This looks like cloud infrastructure. The provider earns money when capacity is contracted and utilized. The key questions are price per hour, utilization, useful life, and power cost.

The second is inference: sell intelligence as tokens, tasks, API calls, agents, or application workflows. This can produce better economics if the model owner converts the same compute into high-value revenue. It can also be worse if too much capacity is consumed by free users, low-value traffic, retries, training experiments, or subsidized growth.

This distinction explains an apparent contradiction. Token prices can fall while infrastructure remains valuable. If each hardware generation produces more tokens per watt, and if inference systems become more efficient through better kernels, routing, caching, quantization, speculative decoding, and model design, revenue per gigawatt can rise even as price per token falls.

The strategic question is not simply "will tokens get cheaper?" They will. The question is whether tokens per dollar and useful work per watt improve faster than price declines.

The Dependency Chain

The most important thing to understand is that every layer depends on the layer below it.

If there is no grid power, the building is stranded. If there is power but no advanced chips, the racks are empty. If there are chips but no HBM, the package cannot ship. If there is compute but weak networking, the cluster underperforms. If there is capacity but low utilization, the financial model breaks.

1. Logic Wafers

The current silicon pinch starts with leading-edge logic.

SemiAnalysis argues that 2026 is the year the major AI accelerator families converge on TSMC's N3 family. Nvidia moves from Blackwell toward Vera Rubin. AMD's MI350 and MI400 programs lean heavily on N3-class capacity. Google's TPU roadmap, AWS Trainium, Meta MTIA, networking silicon, and host CPUs all add pressure.

That convergence makes TSMC the kingmaker. If advanced-node capacity is full, a customer cannot simply buy more GPUs by writing a larger check. It must get allocation. That means another customer, often in smartphones, PCs, or a lower-priority AI program, gets less.

This is why foundry diversification matters. Samsung Foundry and Intel Foundry are not just "nice to have" alternatives. They are strategic options for customers that cannot afford to have one company determine the pace of their compute deployment.

But diversification is not instant. Leading-edge chips require design work, process qualification, yield learning, packaging integration, software bring-up, and customer trust. A wafer bottleneck can move faster than a supply-chain redesign.

2. HBM And DRAM

Memory is the co-binding constraint.

AI accelerators need high-bandwidth memory. The problem is that HBM consumes more wafer capacity per bit than commodity DRAM, is more complex to package, and is dominated by a small supplier base: SK Hynix, Samsung, and Micron.

The pressure is not only from more accelerators. It is also from more memory per accelerator. New platforms add HBM capacity, higher stack counts, higher pin speeds, and more system memory around the accelerator. At the same time, general server replacement demand and CPU-to-GPU ratios add pressure to ordinary DRAM.

That creates a brutal allocation problem. If memory suppliers shift capacity toward HBM, commodity DRAM tightens. If commodity DRAM margins become attractive enough, the incentive to shift more wafers into HBM weakens. Customers may have to pay up for future HBM commitments, especially around HBM4 qualification and the Rubin ramp.

For investors, this is why memory is not a side topic. The AI infrastructure cycle is partly a memory cycle.

3. Advanced Packaging

CoWoS was one of the first obvious bottlenecks in the AI boom. It still matters, but the public picture has shifted. Packaging capacity has expanded, outsourcing options exist through OSATs such as Amkor and ASE/SPIL, and Intel's EMIB provides another path for some designs.

The key point is that packaging capacity only matters if there are wafers and memory to feed it. When front-end logic and HBM are tight, overbuilding packaging does not solve the system constraint.

4. Networking And Optics

Training clusters are not piles of independent chips. They are distributed computers.

That makes networking strategic. Nvidia controls important scale-up and scale-out technologies through NVLink, InfiniBand, Spectrum Ethernet, and its broader systems stack. Broadcom, Marvell, Arista, and the optics supply chain matter because Ethernet, switch ASICs, optical DSPs, transceivers, active electrical cables, and co-packaged optics all become more important as clusters scale.

The network determines how much of theoretical compute becomes useful compute. Weak networking turns expensive accelerators into idle silicon waiting on communication.

5. Power

Power is the structural constraint.

The IEA's 2025 Energy and AI report frames the issue cleanly: there is no AI without electricity for data centers, and data centers are becoming large enough to affect energy planning, security, and affordability. The constraint is not just total generation. It is deliverable power at the right place, on the right timeline, with enough reliability and power quality for dense AI workloads.

That is why "bring your own power" has become part of the AI infrastructure playbook. Companies are exploring onsite gas turbines, reciprocating engines, fuel cells, batteries, microgrids, nuclear offtake agreements, and direct arrangements with utilities.

The logic is economic. If a site can earn millions of dollars per megawatt per year once live, a six-month delay is enormously expensive. Paying more for temporary or onsite generation can be rational if it brings revenue online earlier.

But onsite power creates its own supply chain. Turbines, transformers, switchgear, grid interconnect equipment, backup systems, and permitting can become bottlenecks. Heavy electrical equipment does not scale like software. Lead times can be measured in years.

6. Cooling And Physical Plant

Rack densities are rising. Air cooling is not enough for the densest AI systems. Direct liquid cooling, coolant distribution units, rear-door heat exchangers, and facility-level thermal design are now part of compute strategy.

Cooling is not just an engineering detail. It affects site selection, water use, power usage effectiveness, maintenance, insurance, uptime, and community acceptance.

7. Financing

Financing is a real dependency.

Hyperscalers can fund a large part of the buildout from operating cash flow, but the scale is now testing even their balance sheets. Neoclouds and data center developers rely more heavily on private credit, leases, secured debt, customer prepayments, vendor financing, and long-term contracts.

This is where the bear case gets sharper. If GPU collateral loses economic value faster than accounting depreciation, lenders care. If interest rates rise, returns compress. If customers delay, renegotiate, or concentrate risk in a few large buyers, the financing model becomes more fragile.

The physical stack needs capital before it produces revenue. That makes the cost and durability of capital one of the hidden bottlenecks.

The Player Map

The AI infrastructure ecosystem is easier to understand layer by layer.

Chip Designers

Player	Role	Strategic point
Nvidia	Blackwell, Blackwell Ultra, Vera Rubin, networking, systems software	Still the dominant full-stack supplier
AMD	Instinct MI300/MI350/MI400 family	Credible second source for large customers seeking bargaining power and supply diversity
Google + Broadcom	TPU	In-house silicon with external availability through Google Cloud and strategic customers
AWS + Annapurna/Marvell ecosystem	Trainium and Inferentia	Amazon's hedge against Nvidia dependence and a major Anthropic compute path
Microsoft	Maia	Internal silicon strategy tied to Azure and OpenAI ecosystem needs
Meta	MTIA	Internal inference and recommendation infrastructure; lower-profile than Nvidia but strategically important
Broadcom and Marvell	Custom ASIC and networking partners	Picks-and-shovels suppliers for hyperscaler silicon programs

The theme is not "Nvidia disappears." It is "large buyers want more control." They want supply diversity, better cost curves, workload-specific silicon, and bargaining power.

Manufacturing And Supply

Layer	Main players	Constraint
Foundry	TSMC, Samsung Foundry, Intel Foundry	Leading-node allocation, yield, cleanroom capacity
Memory	SK Hynix, Samsung, Micron	HBM qualification, DRAM wafer allocation, stack complexity
Packaging	TSMC CoWoS, Amkor, ASE/SPIL, Intel EMIB	2.5D capacity, interposers, assembly yield
Equipment and materials	ASML, Applied Materials, Lam Research, Tokyo Electron, specialty chemicals and substrates	Long lead times and geopolitical concentration

TSMC is the most obvious chokepoint, but memory suppliers may control more of the marginal economics than many software-first observers expect.

Networking And Systems

Nvidia, Broadcom, Marvell, Arista, Cisco, optical module suppliers, cable suppliers, and systems integrators all matter because AI clusters are network-bound at scale. The more a workload behaves like one giant synchronized machine, the more important networking becomes.

This is also why Ethernet has become strategically important. Customers want open, multi-vendor paths. Nvidia wants to preserve the performance advantages of its integrated stack. Broadcom and Arista benefit if scale-out Ethernet becomes the standard way to connect enormous AI clusters.

Power And Equipment

The power stack includes:

Turbines: GE Vernova, Siemens Energy, Mitsubishi Heavy Industries, and newer or specialized suppliers
Fuel cells: Bloom Energy and others
Electrical equipment: Eaton, Schneider Electric, Vertiv, ABB, Siemens, transformer suppliers, switchgear suppliers
Batteries and grid support: Tesla Megapack, Fluence, grid-scale storage providers, synchronous condensers, flywheels
Engineering, procurement, and construction firms
Utilities and independent power producers

This layer is less glamorous than GPUs and often more binding. The AI economy can be slowed by a transformer, a turbine blade, an interconnection study, or a local zoning fight.

Compute Owners

There are two broad categories.

Hyperscalers own the customer relationship, cloud platform, capital base, and often the software layer. Microsoft, Amazon, Google, Meta, Oracle, and increasingly xAI and Tesla are playing some version of this game.

Neoclouds own or lease specialized AI capacity and sell it to model labs, hyperscalers, enterprises, and developers. CoreWeave, Nebius, Crusoe, Lambda, Nscale, IREN, Applied Digital, and others sit here.

The neocloud model is the most debated because it sits at the intersection of real demand and heavy financial risk. CoreWeave's reported backlog near $100 billion and revenue growth show that demand is real. They do not, by themselves, settle the question of long-term return on capital after depreciation, interest, utilization, and residual value.

Nebius is interesting because it frames itself less as a GPU reseller and more as a full-stack AI cloud with large anchor contracts funding a broader platform. Crusoe is interesting because it ties compute strategy directly to energy strategy.

Demand Side

The demand side includes OpenAI, Anthropic, Google DeepMind, Meta, xAI, Mistral, DeepSeek, enterprise AI users, coding agents, video models, research labs, and internal hyperscaler workloads.

The near-term demand signal is strong. Agentic coding, enterprise copilots, multimodal generation, video, long-context inference, synthetic data, reinforcement learning, and frontier training all consume compute.

But demand quality differs. A paid enterprise workflow with high willingness to pay is not the same as a free consumer chatbot session. Training compute is not the same as monetized inference. A model lab may be compute-starved and still not yet profitable.

Capital Providers

SoftBank, MGX, Blue Owl, banks, bond investors, private credit funds, infrastructure funds, sovereign capital, and vendor financing all help turn contracts into construction.

This layer deserves attention because capital can masquerade as demand. A deal where a vendor invests in a customer that buys the vendor's hardware may still be economically valid, but it should be analyzed differently from arm's-length customer revenue.

The Deal Web

OpenAI is the clearest example of how the buildout has become a web of technology, capital, and power commitments.

OpenAI and SoftBank announced Stargate in January 2025 as a plan to invest up to $500 billion over four years in U.S. AI infrastructure, with SoftBank, OpenAI, Oracle, and MGX as initial equity funders and Arm, Microsoft, Nvidia, Oracle, and OpenAI as technology partners.

In September 2025, OpenAI and Nvidia announced a letter of intent to deploy at least 10 gigawatts of Nvidia systems, with Nvidia intending to invest up to $100 billion progressively as each gigawatt is deployed. The first gigawatt was targeted for the second half of 2026 on Vera Rubin.

OpenAI also announced or was reported to have major commitments with AMD and Broadcom: up to 6 gigawatts of AMD Instinct systems and 10 gigawatts of custom accelerators with Broadcom. These deals show the same strategic pattern: secure capacity, diversify suppliers, co-design hardware where possible, and use the promise of future demand to pull the supply chain forward.

The caution is equally important. Letters of intent, strategic partnerships, and headline gigawatt numbers are not the same as deployed, paid, profitable capacity. The question for each megadeal is whether it converts into operating sites, funded systems, reliable power, and revenue that survives depreciation.

The Bull Case

The bull case is not just "AI is important." It is more specific.

First, demand is real. Public cloud providers, model labs, and neoclouds continue to report capacity constraints. If customers are willing to sign multi-year contracts for scarce compute, deployed capacity has immediate revenue value.

Second, the technology gets more efficient. Each generation improves performance per watt. Software improves utilization. Inference systems route tasks to cheaper models, cache context, compress prompts, batch requests, and reduce wasted tokens. If useful output per watt compounds, infrastructure can remain valuable even as per-token prices decline.

Third, the revenue mix shifts toward inference. Training is strategic but lumpy. Inference is continuous and tied to product usage. If AI becomes embedded in software, support, coding, search, design, video, medicine, finance, and industrial operations, inference demand can absorb enormous capacity.

Fourth, scarcity creates pricing power. Older GPUs can retain value when new supply is constrained. The fleet does not have to be frontier to be useful. Many enterprise, fine-tuning, batch, and lower-latency workloads can run on older accelerators if the software stack is good.

In the optimistic version, the gigawatt data center becomes an intelligence refinery: expensive to build, but valuable because demand exceeds supply and efficiency keeps improving.

The Bear Case

The bear case is also specific.

The first risk is depreciation. If GPUs are depreciated over five or six years but their economic life is closer to two or three years, reported earnings can look better than true economic returns. This matters most for operators carrying large fleets financed with debt.

The second risk is circular financing. When chip vendors invest in compute buyers, and compute buyers use the money to buy chips, headline demand can look stronger than independent cash demand. The structure may still be rational, but it raises the burden of proof.

The third risk is customer concentration. A neocloud with a few enormous customers can show impressive backlog while still carrying renegotiation, utilization, and counterparty risk.

The fourth risk is power timing. If chips arrive before power, assets sit idle. If power arrives after hardware has aged, returns fall. If onsite generation becomes necessary, costs rise and permitting risk changes shape.

The fifth risk is utilization. High utilization can make infrastructure economics work. Low utilization destroys them. The problem is that public disclosures rarely make true utilization easy to inspect.

In the pessimistic version, the industry builds too much expensive hardware too quickly, financed by contracts and assumptions that do not convert into enough profitable usage before the hardware becomes obsolete.

Both Can Be True

This is the key point: capacity-constrained and overbuild-risk can be true at the same time.

The market can be short of compute today and still overbuild the wrong kind of capacity tomorrow. It can be rational for OpenAI, Anthropic, Google, Microsoft, Amazon, Meta, Oracle, and xAI to secure as much compute as possible while also being true that some smaller infrastructure owners will earn poor returns.

The difference is not "AI works" versus "AI is a bubble." The difference is who controls scarce inputs, who owns demand, who finances the gap, and who keeps utilization high after the first wave of contracts.

Where Value May Accrue

Value can accrue in different places at different phases.

During scarcity, value accrues to bottleneck owners: Nvidia, TSMC, HBM suppliers, power sites, equipment suppliers, and operators with live capacity.

During normalization, value shifts toward whoever owns differentiated demand and software advantage: cloud platforms, model labs, enterprise workflow owners, data platforms, and operators with high utilization and low power costs.

If a16z's 2026 framing is right, undifferentiated GPU hosting becomes more utility-like over time: capital-heavy, competitive, and lower-margin unless the provider has a technical, power, capital, or customer moat. The more interesting surface moves up the stack into enterprise data, workflow-specific AI, industrial infrastructure, energy, manufacturing, logistics, and the systems that turn compute into measurable outcomes.

That does not mean the infrastructure layer is unimportant. It means the infrastructure layer is increasingly governed by industrial economics. The best assets can be very valuable. The mediocre assets can become expensive commodities.

What To Watch Next

The next 12 to 18 months should be judged by concrete indicators.

HBM4 qualification and pricing. Watch SK Hynix, Samsung, and Micron. HBM4 yield, pin speed, stack availability, and contract pricing will determine how much of the next accelerator generation can ship.

TSMC allocation. Watch whether smartphone and PC weakness frees enough N3 capacity for AI accelerators, and whether Samsung or Intel Foundry win meaningful second-source roles.

Vera Rubin and next-generation deployments. The first large Rubin-class deployments will test whether the gigawatt deal language converts into real systems on real power.

Backlog-to-cash conversion. For neoclouds, the question is not only revenue growth. It is whether backlog becomes GAAP profit after depreciation and interest.

Power interconnection and onsite generation. The grid queue, turbine lead times, transformer availability, local opposition, and gas availability are as important as GPU allocation.

Financing spreads. If debt becomes more expensive, the same contract backlog supports less infrastructure value.

Utilization disclosure. The cleanest positive signal would be operators disclosing fleet utilization, gross profit after power, depreciation schedules, and customer mix in ways investors can actually compare.

Inference revenue quality. Watch whether AI revenue shifts from demos and free usage into paid workflows with durable gross margin.

A Practical Checklist

For operators:

Separate training demand from monetized inference demand.
Track cost per useful task, not just cost per token.
Build model routing, caching, batching, and evaluation into the product architecture.
Treat compute commitments as capital allocation decisions, not just vendor negotiations.
Map which workflows truly require frontier capacity and which can run on cheaper models or older hardware.

For founders:

Do not build a company whose only advantage is access to commodity GPUs.
Look for bottlenecks where proprietary data, workflow ownership, or vertical integration changes the economics.
If you depend on scarce compute, design the product so gross margin improves as inference efficiency improves.
Make sure the value proposition survives higher token prices, lower token prices, and constrained availability.

For investors:

Ask who owns the bottleneck.
Ask whether backlog is diversified, funded, and deliverable.
Adjust for economic depreciation, not only accounting depreciation.
Watch power timing and financing cost as carefully as GPU count.
Prefer businesses that convert compute into customer outcomes, not just megawatts into press releases.

The Bottom Line

The AI compute buildout is real. Demand is real. Scarcity is real.

So are the risks.

The mistake is treating AI infrastructure as one trade. It is a layered system where the bottleneck moves and the economics change as each layer catches up. CoWoS was tight, then power dominated, now logic wafers and memory are binding, while power remains the permanent constraint underneath.

The winners will be the companies that understand the whole dependency chain: silicon, memory, packaging, networking, power, cooling, capital, software, utilization, and demand.

The future of AI will not be determined only by who has the biggest model. It will be determined by who can turn scarce physical infrastructure into useful, paid intelligence at scale.

OpenNash helps teams understand AI infrastructure, model economics, and where AI can create measurable business value. To explore a project or workshop, contact [email protected].

Sources and Further Reading

SemiAnalysis, The Great AI Silicon Shortage
SemiAnalysis, Memory Mania: How a Once-in-Four-Decades Shortage Is Fueling a Memory Boom
SemiAnalysis, How AI Labs Are Solving the Power Crisis: The Onsite Gas Deep Dive
IEA, Energy and AI
IEA, Energy and AI Executive Summary
a16z, Big Ideas 2026: Part 2
OpenAI, Announcing The Stargate Project
OpenAI, OpenAI and NVIDIA announce strategic partnership to deploy 10 gigawatts of NVIDIA systems
PC Gamer, AMD seals multi-year megadeal with OpenAI involving 6 gigawatts' worth of AI GPUs
Tom's Hardware, OpenAI and Broadcom to co-develop 10GW of custom AI chips
MarketWatch, CoreWeave shares plunge. Revenue doubles but AI costs are rising
Axios, Hyperscaler spending to hit over $600 billion
Tom's Hardware, Big Tech's AI spending plans reach $725 billion
Business Insider, AI spending from four tech giants will exceed the GDP of Japan through 2030, Goldman says
arXiv, Power Stabilization for AI Training Datacenters
arXiv, Concentrated siting of AI data centers drives regional power-system stress under rising global compute demand
arXiv, Measurement of Generative AI Workload Power Profiles for Whole-Facility Data Center Infrastructure Planning