DigitalOcean launched its Inference Engine on April 28, 2026, a platform that puts four distinct AI inference deployment options under a single interface. The launch is backed by a data center in Richmond, Virginia built exclusively for AI workloads and running NVIDIA’s latest Blackwell Ultra hardware. It also comes with a financial track record: $120 million in AI annualized revenue, up 150% year-over-year, and, for the first time in the company’s history, $1 billion in total annualized monthly revenue. The platform already serves production workloads at Character.ai (over one billion queries per day) and Hippocratic AI (more than 20 million patient interactions processed). For hosting providers and cloud resellers, the launch opens a direct path to building AI-powered products without managing GPU infrastructure independently.

Four Pricing Tiers Built Around How AI Workloads Actually Run

The Inference Engine separates AI inference into four components, each priced according to workload characteristics.

The Inference Router is the first component. It routes requests dynamically across providers based on four parameters: cost, latency, quality, or data residency. The Router is available at no charge during the preview period. DigitalOcean cites a 67% reduction in inference costs in production deployments where routing is applied. DigitalOcean’s launch materials report 3x faster time-to-first-token and 3x higher output speed versus Amazon Bedrock on DeepSeek V3.2.

Serverless Inference covers more than 70 models across text, image, audio, and video categories. Billing is per token or per second with no minimum commitments. Workloads scale to zero when idle. Off-peak pricing applies during lower-demand windows, reducing costs for teams whose inference loads are predictable but not constant.

Batch Inference handles workloads where a live response is not required: document processing, large-scale evaluation runs, data transformation pipelines. DigitalOcean prices it at a 50% reduction versus real-time inference, with a guaranteed completion window of 24 hours.

Dedicated Inference provides reserved GPU capacity for teams where shared-resource variability is not acceptable in production. Billing is per GPU-hour. For teams running their own models, the platform supports bring-your-own-model deployments with hosted weight storage.

The Richmond Facility Built Exclusively for AI Workloads

Underpinning the Inference Engine is a data center in Richmond, Virginia built exclusively for AI workloads and not shared with general-purpose compute. The facility runs NVIDIA HGX B300 accelerators (from the Blackwell Ultra generation) on a 400 Gbps RoCE RDMA network fabric. The hardware investment signals intent: the performance characteristics of AI inference, particularly GPU density, memory bandwidth, and low-latency interconnects, differ enough from general-purpose cloud that purpose-built infrastructure produces measurably different results.

DigitalOcean describes its full platform as spanning five layers: infrastructure, core cloud, inference, data, and managed agents. The Richmond facility serves as the GPU-dense foundation of that platform, and is the hardware layer behind the Inference Engine’s benchmark performance figures.

$1 Billion in ARR and the AI Numbers Behind the Launch

DigitalOcean reported Q4 2025 revenue of $242 million, up 18% year-over-year. The company crossed $1 billion in annualized monthly revenue in December 2025. AI customer revenue grew 150% year-over-year in the quarter. The company added a record $51 million in organic incremental ARR during Q4. Adjusted free cash flow for the full year reached $168 million at a 19% margin. DigitalOcean raised its 2026 revenue growth guidance to 21% following those results.

For a company that has historically served developers and mid-size businesses with straightforward cloud pricing, these numbers reflect a meaningful shift in the scale and nature of its customer base. CEO Paddy Srinivasan described what is driving that shift: “AI has moved from thinking to doing. AI-native companies are no longer building simple applications that make a single model call; they are building distributed, stateful, multi-agent systems.”

Character.ai, Hippocratic AI, and LawVo Already in Production

The Inference Engine is not a preview-stage product. Production deployments are already running at scale across a range of industries.

Character.ai processes more than 1 billion queries per day on DigitalOcean infrastructure. Hippocratic AI has processed more than 20 million patient interactions on the platform and reports 40% lower P99 latency after migrating workloads. LawVo runs more than 130 AI agentsand reports a 42% reduction in costsWorkato reports a 67% reduction in inference costs and 77% faster time-to-first-token after moving to the Inference Router.

The spread of industries, conversational AI, healthcare, legal, and workflow automation, reflects the breadth of use case the Inference Engine is designed to serve. These are not pilot deployments. They are production systems processing meaningful volumes, and they provide the baseline performance evidence that enterprise buyers and hosting providers evaluating the platform need.

For Hosting Providers, Four Tiers That Map to Real Customer Workloads

For managed hosting providers and cloud resellers, the Inference Engine addresses a practical problem: how to add AI inference to a product offering without building or operating GPU infrastructure independently.

The four-tier structure maps directly to the workload types that managed service customers generate. Teams with sporadic or unpredictable inference needs fit Serverless. Teams processing large data jobs on a schedule fit Batch at half the real-time cost. Teams with consistent high-volume production workloads fit Dedicated. The Router layer reduces total spend across all of these by directing traffic to the most cost-effective option for each request.

DigitalOcean’s position in the market has been cost-competitive cloud for developers and mid-size businesses who find hyperscaler pricing and complexity hard to navigate. The Inference Engine applies that same approach to AI compute. Access to more than 70 models across text, image, audio, and video, transparent per-token or per-GPU-hour pricing, and a single interface covering the full range of workload types makes it simpler for a hosting provider to offer AI capabilities than assembling equivalent access across multiple providers separately. For hosting providers whose customers are developer teams or mid-market businesses, that simplicity is the product.