What AI Really Looks Like Inside the Enterprise

Melissa Palmer

November 29, 2025

**What AI Really Looks Like Inside the Enterprise**

I will admit faster than anyone that I live in the nuts and bolts of AI data centers and infrastructure. The xAI Colossus story? Mind-blowing. I read the NVIDIA GB200 NVL72 rack bring-up docs for fun. This stuff is genuinely fascinating.

But the reality is that 99.9 % of organizations will never touch that scale, so let’s set honest expectations for what enterprise AI actually looks like today.

The Enterprise Will Use AI, Not Create It

Everything we do with large language models falls into exactly three phases, and they are wildly different from a compute perspective:

  1. Training, building a frontier model from scratch

    Only xAI, OpenAI, Anthropic, Meta, Google, and maybe two or three others do this. A 1–2 trillion-parameter model trained on 30–50 trillion tokens takes 60,000–150,000 latest NVIDIA GPUs running flat-out for one to four months and costs $30 million to $150+ million in GPUs and power alone.

  2. Fine-tuning, teaching that model to act like your company

    This is the part enterprises actually own. In 2025, nobody does full-parameter fine-tuning at trillion-scale (too expensive). Everyone uses LoRA/QLoRA: freeze the original weights, train tiny adapters (<1 % of the model) on your data. Takes 4–24 GPUs, a few hours to three days, and costs $300–$8,000.

  3. Inference, using the model day-to-day

    A single conversation on a trillion-parameter model needs 32–256 GPUs for a few seconds. Cost: pennies per million tokens.

Let’s take a practical look at the differences in computing power required for each of these phases.

The AI Compute Gap

Here’s a breakdown of the phases of LLMs:

PhaseWhat’s really happeningRough GPU count (2025)Wall-clock timeWho actually does this in 2025Ballpark cloud cost
TrainingBuilding a 1–2T model on 30–50T tokens60,000 – 150,000+1–4 months<10 labs worldwide$30M – $150M+
Fine-tuning (LoRA)Tiny adapters on your data4 – 244 hours – 3 daysAny serious company$300 – $8,000
InferenceOne conversation / API call32 – 256 (1–2 racks)Milliseconds–secondsCloud providers + large enterprisesPennies per 1M tokens

To make things make sense in a practical way:

  • Training one frontier model = a medium sized city’s electricity for months
  • LoRA fine-tuning the same model on your data = running your home A/C non stop for a couple of the hottest summer weeks
  • A million plain inference queries = one high end gaming PC on a weekend binge
  • A million RAG-augmented queries = leaving the TV and a few lights on for that same weekend

Where RAG Fits In (Even Cheaper and Faster)

RAG (Retrieval Augmented Generation) has quietly become the enterprise default because it’s almost embarrassingly light. It never touches the model’s weights. Instead, every query pulls the latest relevant snippets from your own documents (Confluence, SharePoint, CRM, contracts, pricing sheets) and injects them into the prompt. Latency adds ~100 ms, power is negligible, and the result is an assistant that never hallucinates yesterday’s policy.

In 2025, almost nobody chooses between fine-tuning and RAG, they stack both. LoRA teaches tone and reasoning style; RAG keeps facts current. The combo is magic and runs on a couple of racks or even just API credits + a vector database.

Hardware Reality Check: What Enterprises Actually Need

You’re not building Colossus. For inference, fine-tuning, and RAG, your realistic choices are:

On-prem Setups

  • VMware Private AI Foundation with NVIDIA – the true easy button. Three GPU-enabled nodes (Dell, HPE, Lenovo) in your existing VMware environment and you’re live. Expand as needed.
  • NVIDIA DGX BasePOD – start with four DGX B200/B300 nodes. Often fits in existing data centers.

The Power Problem
A single DGX B300 draws ~14.5 kW. Eight-GPU servers are similar. When your DC can’t handle it, DGX-Ready colocation partners solve it.

Traditional Hyperscalers (AWS, Azure, GCP)

Price and existing contracts usually win here. Reserved instances bring costs down, but you’ll still pay a premium and sometimes wait weeks for capacity.

Neoclouds or Where Most of the 2025 Magic Happens

CoreWeave, Lambda Labs, Crusoe, Together AI, Nebius, TeraWulf, and others. Ex-crypto farms turned AI hyperscalers, funded with billions, stocked with H100s and early Blackwell.

With the pricing I found on the intenret? $2.20–$3.50 per H100-hour (vs. $6–$8 on AWS), instant provisioning, zero egress fees. A weekend LoRA job on a 405B model costs $800 instead of $4,000. Many enterprises now develop on neoclouds and only move to on-prem once the workload is proven and steady.

If their data centers can’t support AI workloads? A neocloud is not a bad place to stay.

Deployment OptionRough monthly cost (500-person org, internal assistant + RAG + occasional LoRA)Source / Notes (Nov2025)
On-prem (VMware Private AI or small DGX BasePOD)$80K–$150K upfront + $5K–$10K/month power & ops after year 1Dell/HPE list prices, DGX partner quotes, typical U.S. colo power
Traditional hyperscalers$8,000 – $40,000AWS p5 reserved, Azure ND_H100_v5, GCP A3 Mega
Neoclouds$4,000 – $20,000CoreWeave $2.50–$3.20/hr reserved, Lambda clusters, Crusoe rates

Public list prices or widely quoted enterprise reserved rates, Nov 2025.

So where does that leave real companies in late 2025?

Most companies today don’t pick just one path, they pick a few. They experiment and fine-tune on a neocloud (cheap, instant GPUs), then move the proven, steady state workload to on-prem VMware Private AI or a small DGX BasePOD once the ROI is obvious and compliance demands it if they can handle AI workloads on-prem. Hyperscalers still win when you’re already locked into one ecosystem and don’t want to rock the boat.

The trillion parameter superclusters are breathtaking, but they’re someone else’s electricity bill. Your company gets the exact same intelligence, customized to your tone and your latest data, for the price of a loaded SUV and a few thousand kilowatt-hours a month.

That’s not science fiction anymore, that’s modern enterprise AI. The hardware is available, the software is mature, and the gap between “playing with ChatGPT” and “running a private, compliant, data specific assistant” has never been smaller.

Leave a Comment