Copilot Q&A: How Cloudways Built an AI That Diagnoses Infrastructure Issues in Minutes

by Łukasz Nowak · April 21, 2026 · Interviews · 11 min read

Cloudways has been managing cloud hosting infrastructure for over a decade, with WordPress as its primary focus. By the time AI became a practical tool, the company had years of support data, server logs, and repeatable failure patterns across thousands of customer environments to work with. The question was not whether to add AI but where it would actually change outcomes rather than just appear on a feature list.

The answer was diagnostics. Most infrastructure issues are not unique: unoptimized database queries, bot traffic spikes, PHP memory exhaustion, resource constraints. They recur across environments regardless of what is running on them. WordPress amplifies the frequency because of its plugin ecosystem and the profile of its typical operator, but the underlying signals are infrastructure-level. The bottleneck was never detection; it was the time required to reach the root cause. Cloudways Copilot, built with Traversal and running on DigitalOcean’s Gradient AI Platform, was designed to close that gap. After a public preview in January 2025, it reached general availability in August 2025. In production, it reduces average troubleshooting time from 30 to 40 minutes down to 5 to 6 minutes, with diagnostic accuracy validated at over 90 percent against real support cases.

We spoke with Ayaz Ahmed Khan, Senior Director of Engineering at DigitalOcean, about how Copilot was built, what the preview period taught the team, where automated diagnosis still falls short, and what the roadmap looks like over the next 12 months. This is part one of a two-part interview. Part two will cover pricing, competitive positioning, and Copilot’s role within the broader DigitalOcean product strategy.

Cloudways Copilot: AI in hosting management moves beyond the buzz

Part one: Why and how Cloudways Copilot was built

webhosting.today: Cloudways has been managing WordPress infrastructure for years before adding AI. What specifically were you seeing in support data and server logs that told you automated diagnostics could work – that this was a problem AI could actually solve, not just a feature to add to the marketing page?

Ayaz Ahmed Khan: When AI started gaining momentum, we were very deliberate about how we wanted to use it. The goal wasn’t to add AI as a surface-level feature, but to apply it to the most consistent, time-consuming challenges our customers face.

One of the clearest patterns we’ve seen over time is that most issues aren’t unique. They’re repeatable problems, things like unoptimised database queries, traffic spikes or resource constraints that show up again and again across different environments.

Because support has always been a core part of Cloudways, we’ve had deep visibility into how these issues develop and how long they take to resolve. The real bottleneck was never detecting that something was wrong; it was the time and expertise required to diagnose the root cause.

That’s where AI became a natural fit. With the volume of infrastructure and support data available, we can detect patterns, identify root causes and guide resolution much faster. It’s about solving real operational problems for SMBs, not introducing AI for its own sake, and ultimately reducing the time they spend troubleshooting so they can focus on running their business.

webhosting.today: Building an AI product inside an established hosting platform is not trivial. How is the Copilot team organized within DigitalOcean? Is it a dedicated team, a cross-functional group, or does it sit within the broader Cloudways engineering organization? How many people work on it?

Ayaz Ahmed Khan: Copilot isn’t built as a standalone AI initiative, it’s a cross-functional effort that brings together a team of 10+ engineers and product specialists across DigitalOcean and Cloudways.

At its core, it’s powered by DigitalOcean’s Gradient AI platform and an AI-driven reliability layer that continuously monitors systems, identifies what’s going wrong and helps determine how to fix it. Rather than just surfacing alerts, it connects signals across the stack to diagnose root causes in real time.

What makes it effective is how closely it’s integrated with Cloudways engineering and support. The product is shaped by real customer environments and real troubleshooting workflows, not theoretical use cases.

That combination allows us to build AI capabilities that are technically robust but also grounded in how customers actually use the platform day to day.

How Cloudways is challenging traditional hosting providers – exclusive interview with Fatih Mehtap

webhosting.today: You went from public preview in January 2025 to GA in August – about seven months. What changed between those two versions? What did you learn from the preview period that shaped the product that launched in August?

Ayaz Ahmed Khan: The biggest focus during the preview period was improving diagnostic accuracy. It wasn’t enough for Copilot to flag that something was wrong, it needed to consistently identify the right root cause. To do that, we spent a significant amount of time reviewing outputs against real support cases, essentially asking whether an experienced engineer would reach the same conclusion. That feedback loop was critical in refining the system.

One of the key learnings was: identifying an issue is only part of the problem. The real challenge is guiding resolution in a way that works within a managed hosting environment. Customers don’t always have full server access, so we had to ensure the remediation steps were not just technically correct, but practical and executable within Cloudways. That meant tailoring recommendations very specifically to how our platform works.

We also learned a lot about how customers interact with AI in a production environment. There’s a natural hesitation around automated fixes, so we focused on making outputs clearer, more transparent and more actionable. Before expanding automation, it was important that customers could understand and trust what Copilot was telling them.

Overall, the preview phase was less about adding new features and more about aligning the system with real-world usage, making sure it reflects how issues actually occur, how engineers troubleshoot them, and how customers resolve them day to day.

Asia’s Data Center Boom: Inside the Billions Reshaping the Region’s Hosting Infrastructure in 2026

webhosting.today: Every product has moments where things do not go as planned. What was the biggest surprise or setback during Copilot’s development or rollout – something that forced the team to change direction?

Ayaz Ahmed Khan: One of the biggest challenges was dealing with the complexity of modern hosting environments. Websites today rely on multiple interconnected components, from databases and caching layers to third-party plugins, which means when something goes wrong, the root cause is often buried across multiple signals.

Early on, this created a risk that Copilot would surface too much information without enough clarity. The challenge wasn’t just analysing signals, but prioritising the right ones and presenting a clear, actionable root cause, rather than overwhelming users with data.

A key part of that was replicating how a human support engineer actually troubleshoots. It’s not a linear process; engineers follow investigative paths, hit dead ends, and then course-correct based on new signals.

Translating that kind of iterative, experience-based reasoning into an automated system required us to rethink how Copilot correlates signals and evaluates potential causes, so it behaves less like a monitoring tool and more like an experienced engineer working through a problem.

Even then, it’s not something you solve once and move on from. A big part of the process has been continuously reviewing outputs against real support cases, effectively keeping a human in the loop to validate whether the system is identifying the right issue and suggesting the right next steps. That ongoing feedback loop has been essential in improving accuracy over time.

webhosting.today: You report up to 4x faster issue resolution and over 90+% diagnostic accuracy. The preview announcement cited “over 45 minutes to under 10 minutes,” while the GA launch referenced “30-40 minutes to 5-6 minutes.” Which baseline reflects real-world performance, and how are these metrics measured?

Ayaz Ahmed Khan: The GA figures reflect the most accurate picture of real-world performance, based on a broader set of customer environments and usage.

In practice, customers were previously spending 30 minutes or more troubleshooting issues, often involving manual investigation or support. With Copilot, that’s reduced to just a few minutes, delivering up to 4x faster resolution and saving over 30 minutes per incident on average.

The biggest impact comes from accelerating diagnosis. Copilot analyses signals across the stack in real time and surfaces a clear root cause, removing much of the time traditionally spent figuring out what’s gone wrong.

Its diagnostics are validated against real support cases and benchmarked against what an experienced engineer would conclude, which underpins the current accuracy levels of over 90+%.

webhosting.today: Copilot is built in partnership with Traversal and runs on DigitalOcean’s Gradient AI Platform. How does this architecture differ from competitors who use general-purpose LLMs for their support AI? What does the Traversal partnership give you that an off-the-shelf model would not?

Ayaz Ahmed Khan: The key difference is that Copilot isn’t just an AI layer sitting on top of support. A lot of tools today use general-purpose models to interpret tickets and suggest answers, but they’re still working at a surface level.

What we’ve built with Traversal and DigitalOcean is something much closer to how an engineer would approach a problem. Copilot looks at what’s actually happening across the system, pulling signals from different parts of the stack and working out the root cause in real time.

That means instead of getting an alert and having to figure it out yourself, you’re given a clear explanation of what’s gone wrong and what to do next. In many cases, you can go a step further and apply the fix directly through SmartFix.

At the same time, we’ve been very deliberate about how it operates. It runs within a controlled environment, so it can diagnose and guide fixes without having unrestricted access to the system.

Ultimately, it’s about taking away the back-and-forth of traditional support and giving customers a faster, more direct way to understand and resolve issues.

Get one-on-one advice on maximizing your hosting company’s valuation and navigating the sale process.

Book a consultation

webhosting.today: WordPress sites face specific operational challenges – bot traffic spikes, brute force login attempts, plugin conflicts after updates, PHP memory exhaustion. Which of these WordPress-specific issues does Copilot handle well today, and which remain difficult for automated diagnosis?

Ayaz Ahmed Khan: Copilot is strongest today on high-volume, pattern-based issues that impact performance and availability. That includes things like aggressive bot traffic, DDoS or DoS activity, resource constraints and unoptimised database queries, where there are clear signals across the infrastructure and a well-defined root cause.

These are the types of issues we’ve seen repeatedly across WordPress environments, and where faster diagnosis has an immediate impact on site performance and uptime.

Where it becomes more challenging is in highly customised or application-level scenarios, for example, plugin conflicts after updates or complex interactions between multiple plugins and configurations. In those cases, the signals are less consistent, and the root cause can vary significantly between environments, which makes automated diagnosis less deterministic.

The focus today is on solving the most common and high-impact issues reliably, while continuing to improve coverage over more complex scenarios. For those edge cases, human expertise still plays an important role alongside Copilot.

webhosting.today: What does the Copilot roadmap look like for the next 12 months? Are there plans to move beyond server-level monitoring into WordPress-level operations – plugin management, security scanning, performance optimization, or pre-update impact analysis?

Ayaz Ahmed Khan: We’re already moving beyond diagnostics into a more proactive and action-oriented model. With capabilities like SmartFix, Copilot can not only identify issues but also guide or execute resolution, and we’re continuing to expand that into earlier detection of performance-impacting problems.

Over the next 12 months, the focus is on making Copilot more deeply integrated into the customer workflow, so it’s not just surfacing insights, but actively helping customers prevent issues before they occur.

That does naturally extend beyond server-level signals over time. As we evolve, the goal is to provide more context-aware intelligence across the full stack, including application-level behaviour, while ensuring recommendations remain reliable and actionable within a managed hosting environment.

The long-term vision is infrastructure that doesn’t just react to problems, but anticipates them, enabling customers to stay ahead of issues rather than troubleshoot them after the fact.

The tension running through this interview is the one every AI infrastructure product eventually has to resolve: the gap between what the system handles reliably today and what the roadmap promises. Copilot is demonstrably strong on the problems it was designed around: infrastructure-level signals, repeatable failure patterns, high-volume traffic events. These are exactly the issues that show up most often across a managed hosting platform serving thousands of SMB customers.

Plugin conflicts remain the hard problem, and Khan’s answer on that point is more honest than most vendor interviews produce. The signals are inconsistent, the root cause varies between environments, and automated diagnosis is less deterministic. That is the category where most WordPress operators actually spend their troubleshooting time. Closing that gap is where the next 12 months of the roadmap point, and the credibility of the larger vision depends on it.

Łukasz Nowak

Nearly two decades in IT. A decade in web hosting - and still in the trenches. Writing about the infrastructure that runs the internet from the inside.

Copilot Q&A: How Cloudways Built an AI That Diagnoses Infrastructure Issues in Minutes

Hosting M&A Consultation

You're sure to like it too

The Hosting Price Squeeze, Part 1: Why Your Costs Keep Climbing

We Asked Six Hosting AI Assistants to Name a Better Rival. Not One Would.

GoDaddy Opens Its Domain API to AI Agents, With Guardrails Built In

Green Olive Tree Acquires ZebraHost for $1.1 Million, Nearly Doubling in Size