An Analysis of Tech Giants' Billion-Dollar Investments

While the German automotive industry invests around 50 billion euros annually in research and development, Microsoft is flexing its muscles in the race for AI dominance, pouring an incredible 80 billion dollars into expanding its AI data centers over the same period. These figures make it clear: The tech giants are serious about artificial intelligence and are investing heavily in the necessary infrastructure.

This raises the question: Are German and European companies falling behind in the global race for AI dominance? Or is this a risky game that could end in a massive crash?

In this article, I will analyze the investments in AI chips more closely, scrutinize the underlying business cases of hyperscalers like Google, Microsoft, and Amazon, and try to understand better the actual risk involved.

Because one thing is clear: The decisions of the tech giants will significantly influence not only the future of technology but also our portfolios as retail investors.

The Starting Point: Billion-Dollar Investments in AI – But Why?

The news is clear: Tech giants are investing massively in AI. But what's behind it? Let's take a look at the key investments, divided into the areas of infrastructure, energy, and chips.

Infrastructure and Data Centers:

  • Microsoft and BlackRock: In September 2024, they announced a $30 billion fund to invest in AI infrastructure such as data centers and energy projects. (Source: Inside IT)
  • Alphabet (Google): Plans to invest $50 billion in expanding its AI infrastructure, including data centers and cloud computing. (Source: Kapitalerhöhungen)

Energy Supply for Data Centers:

  • Microsoft: Plans to restart the decommissioned Three Mile Island nuclear power plant in Pennsylvania by 2028 to supply its data centers with CO₂-free energy. (Source: Bild)
  • Meta (Facebook): Plans to build its own nuclear power plants in the USA starting in the 2030s to meet the growing energy demand for AI applications. (Source: Bild)

AI Chips:

  • Microsoft: Outpaced competitors in 2024 with an order of 485,000 NVIDIA Hopper AI chips. (Source: Heise)
  • Amazon, Google, Meta, ByteDance, Tencent: Each ordered between 169,000 and 230,000 NVIDIA chips. (Source: Heise)

Competitive Landscape: The Battle for AI Dominance

The investments mentioned above show intense competition for dominance in the AI market has erupted. Microsoft, Google, and Amazon currently dominate the cloud market with market shares of approximately 20%, 10%, and 33% respectively (Source: Statista).

In the AI sector, they are in a neck-and-neck race with specialized companies like OpenAI (in which Microsoft has heavily invested), Anthropic, and others. While competitors' investment strategies differ in detail, they all aim to secure a decisive advantage in the future AI market. Intel and AMD are also trying to enter the market with their own AI chips.

Critical Questions About the Investments

This massive wave of investment raises critical questions:

  • Is the focus on nuclear power for the 2030s forward-thinking, given the continuous improvements in renewable energy?
  • Are these billion-dollar investments truly AI-specific expenditures, or are they just necessary expansions that normal cloud growth would require anyway? (= just marketing)
  • Is it economically sensible to invest heavily in current chips now, when AI models are becoming more efficient and chip technology is developing rapidly (Moore's Law)?
  • And the decisive question for us as investors and companies: Are we witnessing the formation of an AI bubble that's bound to burst?

In the following, I will examine these questions with a simplified back-of-the-envelope calculation and analysis.

Hypotheses and Questions: What Do We Want to Find Out?

Fundamentally, I assume that the tech giants make their investment decisions carefully and that these investments might appear larger and riskier from a mid-sized business perspective than they are.

Therefore, I want to examine a positively formulated hypothesis in this article and investigate the questions derived from it:

  • Primary Hypothesis: While the massive AI investments by tech giants appear risky at first glance, they are likely a calculated move that will pay off in the long term due to market dynamics, the hyperscalers' customer base, and potential efficiency gains through AI.

Secondary Questions:

  1. Investment Timing: Is it strategically wise to invest heavily in current chip generations (e.g., H100, H200) now, or would it be better to wait for the next, more powerful generation?
  2. Chip Usage: Are the bulk-ordered chips primarily used for training new AI models or for inference, meaning the application of existing models?
  3. Future Chip Demand: Will there even be such high demand for specialized AI chips for inference in the future, when models become more efficient and possibly run on standard hardware like laptops and smartphones?
  4. Profitability: How many users can be served with the current AI chip investments, and how quickly will these investments pay off considering realistic utilization scenarios? (Back-of-the-envelope calculation)
  5. Technological Development: Will the exponential improvement of Large Language Models (LLMs) continue, or are we approaching a technological plateau? (This question significantly influences the long-term profitability of the investments.)
  6. AI Bubble: Do the answers to the above questions indicate the formation of an AI bubble, or do the potential returns and strategic importance of AI justify the investments made?

I'm aware that important internal information is missing to conclusively answer these questions. Nevertheless, I believe that with publicly available data and a rough analysis, we can create a picture that's superior to gut feeling.

Back-of-the-Envelope Calculation: How Many Users Can Microsoft Serve with its AI Chips?

To make the scale of Microsoft's investments in AI chips more tangible, we'll perform a simplified calculation, known as a "back-of-the-envelope calculation." We'll focus on the 485,000 Hopper chips (H100, H200, H20, H800) that Microsoft has reportedly ordered based on investment spending in 2024. Our goal is to estimate how many users can be served simultaneously per Hopper chip and how the investment pays off under various utilization scenarios.

Assumptions

The following table summarizes the assumptions made and explains the reasoning behind them:

Assumption Value Reasoning
Chip Type H100 Simplification; H100 is currently the most widely used chip in the Hopper series and provides good benchmark values.
Chip Price 32,500 $ Estimate based on reports citing prices between 25,000 and 40,000 dollars per H100 chip.
Chip Purpose Distribution 50% Copilot, 50% Cloud Services Assumption; Microsoft serves both its own applications (Copilot, user-based subscriptions) and customers via cloud services (API calls, VMs). A 50/50 split is a plausible, albeit conservative estimate.
Cloud Services Distribution 50% API, 50% VM Assumption; Within cloud services, an even distribution between API-based services ($/million tokens) and virtual machine rentals is assumed ($/hour).
Infrastructure Costs 30% of chip price (9,750 $) Beyond chips, costs include servers, network infrastructure, and the data center itself. 30% is a realistic, though possibly slightly conservative estimate.
Power and Cooling Costs per Chip 570 $ per year Calculation based on 700 watts maximum power consumption, 60% average utilization, 30% cooling overhead, and a US electricity price of 0.12 $/kWh.
Operating Costs 10% of chip and infrastructure costs per year (4,200 $) Costs for personnel, maintenance, software licenses, etc. However, hyperscalers typically have excellent automation in place, hence we set only 10% here.
Cluster Size 8x H100 Based on benchmarks that use clusters with 8 H100 chips (Source: Benchmark).
Throughput per Cluster (Inference) 10,000 tokens/s Based on a benchmark with Mistral Large (123B parameters) at FP8 and 64 concurrent requests, achieving a peak throughput of over 10,000 output tokens per second with a latency of 100ms for the first token (Source: Benchmark).
Copilot Price 30 $ per month Price for Microsoft 365 Copilot for enterprise customers (Source: Microsoft).
API Price 60 $ per 1M output tokens Price for output tokens on Azure (Source: Azure).
VM Price 3.24 $ per hour Rental price for a virtual machine with one NVIDIA H100 on Ori's Public Cloud (Source: Ori's Public Cloud).

Calculation

  • Number of Clusters: With 485,000 Hopper chips, Microsoft can operate approximately 60,625 clusters (485,000 / 8).
  • Cluster Distribution: We assume that 30,313 clusters are used for Copilot and 30,312 clusters for Cloud Services. Of the cloud clusters, 15,156 are used for API calls and 15,156 for virtual machines.
  • Revenue Side:
    • Virtual Machines: At full utilization, 15,156 clusters × 8 H100 × $3.24/hour × 24 hours × 365 days could generate $3.44 billion per year.
    • API Calls: (10,000 output tokens/s × 15,156 clusters × (60 × 60 × 24 × 365) seconds) / 1,000,000 tokens × $60 = $4.78 billion per year.
    • Copilot: With 30,313 clusters, 303,130,000 output tokens/second can be generated with 1.9 million concurrent requests. With 320 million daily active Microsoft Teams users, this corresponds to 0.6% of users using their Copilot simultaneously. If a Copilot user uses the Copilot for 10 out of 60 minutes, this results in a theoretical user base of 6 × 1.9 million users = 11.4 million users. For $30/month per user, this equals $4.1 billion per year.
    • Total Revenue: Under these assumptions, Microsoft could generate 3.44 bn $ + 4.78 bn $ + 4.1 bn $ = $12.32 bn per year at 100% utilization.
  • Expense Side:
    • Acquisition: $42,250 (chip + infrastructure) × 485,000 Hopper chips = $20.49 bn
    • Power & Cooling: $570 (electricity) × 485,000 Hopper chips = $0.28 bn
    • Operations: $4,225 (operations) × 485,000 Hopper chips = $2.05 bn
    • Total Expenses Year 1: $22.82 bn
    • Total Expenses Year 2 and subsequent years: $2.33 bn

Break-Even Scenarios

To evaluate profitability, let's examine various utilization scenarios:

Utilization Revenue (bn $) Profit Year 1 (bn $) Profit Year 2+ (bn $) Years to Break-Even
10% 1.23 -21.59 -1.10 Not achievable
20% 2.46 -20.36 0.13 Not achievable
30% 3.70 -19.12 1.37 14
40% 4.93 -17.89 2.60 6.9
50% 6.16 -16.66 3.83 4.4
60% 7.39 -15.43 5.06 3.1
70% 8.62 -14.20 6.29 2.3
80% 9.86 -12.96 7.53 1.85
90% 11.09 -11.73 8.76 1.4
100% 12.32 -10.50 9.99 1.1

Sensitivity Analysis

The above calculation heavily depends on the assumptions made. Let's examine how changes in individual parameters affect the amortization period (regarding 80% utilization):

  • Chip Price: A reduction in chip price by $5,000 to $27,500 (total cost with infrastructure: $37,250) would shorten the amortization period to 1.8 years. A higher price of $37,500 (total cost: $47,250) would extend it to 2.1 years.
  • Utilization: A reduction in utilization to 60% extends the amortization period to 3.1 years. An increase to 100% shortens it to 1.1 years.
  • Copilot Price: A reduction in Copilot price to $20/month extends the amortization period to 2.3 years. An increase to $40/month shortens it to 1.3 years.
  • Number of Copilot Users: In the base scenario, we assume 11.4 million Copilot users simultaneously using the service. A reduction to 5 million users extends the amortization period to 2.9 years. An increase to 20 million users shortens the amortization period to 0.9 years.
  • API Price: A 50% reduction in API price leads to an amortization period of 3.0 years.

Discussion and Conclusion of the Back-of-the-Envelope Calculation: Is the Risk Manageable for Microsoft?

The back-of-the-envelope calculation has shown that the amortization period of AI chip investments heavily depends on chip utilization. With an assumed utilization of 80% and a chip price of $32,500, the investment amortizes in just under 2 years. The sensitivity analysis has also demonstrated that the amortization period remains within a plausible range even when varying the assumptions. But how should we evaluate this result? Is the risk manageable for Microsoft?

Ambitious Utilization, but Feasible?

As a manager at Microsoft, I would admittedly become nervous if the planned amortization takes longer than 3 years. To achieve this goal, according to this back-of-the-envelope calculation, the chips would need to be utilized at about >70-80%. This is ambitious, particularly when considering the current qualitative challenges of LLMs. While many companies are experimenting with AI, broad productive implementation is still in its infancy. Many AI projects fail in practice due to the last, crucial 20% of quality. Often, it's the nuances and unexpected problems in the details that lead to applications not delivering the hoped-for performance.

Putting Things in Perspective: Microsoft's Core Business

However, these numbers need to be put in perspective relative to Microsoft's size and financial strength. According to estimates, Microsoft will invest around 80 billion dollars in building and expanding new AI data centers in fiscal year 2024/2025, almost three times as much as in the previous year (Source: Manager Magazin).

My back-of-the-envelope calculation assumes an investment of approximately 23 billion dollars for chips and infrastructure. It's therefore likely that the 80 billion dollars also includes other regular cloud data center investments.

Let's put the potential revenue from the back-of-the-envelope calculation in relation to Microsoft's total revenue: In the last quarter, Microsoft generated revenue of 65 billion dollars from products and services, extrapolating to 260 billion dollars per year (Source: Doppelgänger Tracking Sheet). To achieve 10 billion dollars in revenue at 80% chip utilization with AI services, Microsoft would need to increase this revenue by about 3.85%, or about 0.95% per quarter. In this context, it doesn't seem so far-fetched anymore.

Opportunities in Core Business and Potential Supply Shortages

This is where Microsoft's enormous core business comes into play. With its dominant market position in operating systems, Office applications, and cloud services, Microsoft has tremendous leverage to integrate AI functions into existing products and sell them to millions of customers. In terms of scaling, it seems feasible to charge an additional 4$ for AI services for every 100$ from a customer, for example through Copilot subscriptions or the use of AI functions in Azure.

It could even be that Microsoft could achieve 90-100% chip utilization relatively easily and would actually like to order more, but they are simply not available. This would align with the expected market supply shortages for H100/H200 chips (see reports on supply shortages).

Strategic Advantages of the Investment

Beyond direct revenue from AI services, Microsoft benefits from additional strategic advantages:

  • Competitive advantage in the battle for market share: A decisive battle for "Early Adopters" and the "Early Majority" in the AI market is currently taking place. Whoever wins these customers secures long-term market share and customer loyalty.
  • Training of proprietary LLMs: The acquired chips can also be used for training their own advanced LLMs, which can provide a significant competitive advantage. Microsoft is investing heavily in its own AI research and development to differentiate itself from competitors and become more independent from partnerships like the one with OpenAI.

The Great Unknown: The Technological Development of LLMs

The crucial question remains: Will the exponential improvement of LLMs continue, or are we approaching a technological plateau?

Currently, voices are arguing that the traditional scaling approach through ever-larger training data and model parameters (e.g., "300-billion-parameter model", use of "synthetic data") is reaching its limits.

A new approach is the improvement of inference, meaning investing more computing power and time during model usage. This approach is currently showing promising results (see leap from OpenAI o1 to o3).

However, it's too early to say with certainty whether the exponential improvement of LLMs will continue. LLM research is still in its infancy. My gut feeling tells me that humanity's imagination in AI has been awakened and that the substantial capital flowing into this field will lead to further progress. Perhaps not as quickly as hoped, but steadily.

Detailed Questions and Their Implications

Is it smarter to order many chips of the current generation now, or should we wait for the next generation?

The market is developing rapidly, and there will always be a better chip generation (see Moore's Law). However, the back-of-the-envelope calculation shows that the investment can likely pay off with today's functioning LLM use cases when considering a timeframe of 2-3 years. If the chips are in use for 3-5 years, there's no compelling reason to wait for better chips. They can certainly be utilized fully with today's feasible LLM applications.

What are hyperscalers ordering so many chips for: Inference or training?

My back-of-the-envelope calculation focused on inference, meaning the use of AI models. But hyperscalers also need the chips to train their own LLMs. For example, Microsoft provides GPUs to OpenAI at favorable conditions as part of a partnership to train their models, which Microsoft then integrates into its services. At the same time, Microsoft is working on its own LLMs to become more independent and differentiate itself.

So the chips are used for both purposes. If the chips aren't fully utilized by customer business, free capacity can then be used for training their own models – a win-win situation that further supports the investment decision.

Will So Many Chips Still Be Needed for Inference in the Future?

This is one of my main concerns: Will we still need this much computing power in 3-5 years as AI technology evolves? Smaller models are already running on MacBooks and are being optimized for smartphones.

But there's reassurance here too. First, as we've seen, the investment can pay off after the second year. Second, there are currently no signs of fundamentally different approaches in AI core technology. Based on current knowledge, we can assume that LLMs will play an important role in the next 3-5 years and that inference will require more computing power.

Yes, some models will run on smartphones and locally on computers, and these models will be good for simple text processing. But for more complex tasks, we will continue to rely on the cloud for the next 1-3 years.

Detailed Risks: More Than Just a Bubble?

Beyond the question of technological development, there are additional risks:

  • Regulatory Risks: Increasing AI regulation, particularly in the EU (e.g., AI Act), could restrict or increase the cost of AI deployment in certain areas.
  • Ethical Risks: The use of AI raises ethical questions, e.g., regarding discrimination, surveillance, and job losses. These could lead to reputational damage and regulatory interventions.
  • Security Risks: AI systems are vulnerable to cyber-attacks and manipulation. A serious security incident could shake confidence in AI technologies.
  • Skilled Labor Shortage: The lack of qualified AI experts could slow down the development and implementation of AI solutions.
  • Dependence on NVIDIA: The heavy reliance on NVIDIA as a chip manufacturer poses risks regarding pricing and supply chains.

Do the Answers to the Above Questions Indicate an AI Bubble?

Whether this is an AI bubble cannot be determined with certainty yet. It's like Schrödinger's cat: we won't know until we open the box – in this case until we see whether AI fundamental research continues to produce exponential improvements or not.

If LLM development stagnates, in my opinion, there won't be enough use cases for all these tools and infrastructures, and the bubble will burst.

However, if the current improvement path continues, the problems that are still preventing widespread AI adoption today will be solved sooner or later, and the investments will pay off.

Conclusion: Calculated Risk with Potential Upside

My curiosity has been satisfied with this back-of-the-envelope calculation: In summary, Microsoft's massive AI investments represent an ambitious but calculated risk.

The back-of-the-envelope calculation shows that the investments could pay off in less than 3 years, assuming an ambitious but not unrealistic utilization rate of 70%. Microsoft's sheer size and financial strength, coupled with its massive existing business, provide good conditions for achieving this necessary utilization and successfully marketing AI services. The strategic advantages, such as the race for market share and training of proprietary LLMs, reinforce the arguments for these investments.

The great unknown remains the technological development of LLMs. If the current trend of exponential improvements continues, the chances are good that the investments will pay off and Microsoft can further strengthen its position as a leading tech company. In this case, the risk for Microsoft (and its retail investors) would be more of an opportunity than a real risk.

However, if development stagnates, the AI bubble could burst. Ultimately, each investor must decide for themselves whether they believe in the future of AI and are willing to take on the associated risk. Microsoft's massive bet on AI chips is, however, a strong indication that the company is convinced of this technology's long-term potential. And if this bet pays off, investors stand to gain significant opportunities.

What's your take?

Do you have any ideas on how to improve the back-of-the-envelope calculation? And do you believe that the hyperscalers' gigantic investments in AI chips will pay off? Or do you believe in an AI bubble?

Share your opinion and future predictions in the comments!