Google Cloud Incorrectly Suspended Railway’s Account. Eight Hours Down.

by Natalia Nowak · May 20, 2026 · Other · 4 min read

The Short Version

At 22:19 UTC on May 19, Railway identified that Google Cloud had incorrectly suspended its production account via an automated action, with no advance notification
Railway’s workloads on AWS and Railway Metal were physically running throughout the incident but became unreachable anyway
The reason: Railway’s edge proxies depend on a GCP-hosted control plane API to populate routing tables; when cached routes expired, all workloads returned 404 errors regardless of where they ran
Secondary cascade: GitHub rate-limited Railway’s OAuth due to retry volume, blocking logins and builds
Full resolution came at 07:58 UTC on May 20, approximately eight hours after the outage began
Railway has committed to removing GCP from the data plane’s hot path and extending control plane redundancy across AWS and Railway Metal

Railway, a cloud deployment platform, was offline for approximately eight hours on May 19-20, 2026, after Google Cloud incorrectly suspended Railway’s production account as part of an automated action. GCP restored account access within seven minutes of Railway filing an emergency support ticket, but restoring services took the rest of the night. The incident exposed a dependency that is worth understanding: Railway’s workloads on AWS and Railway Metal were running the entire time, but customers could not reach them.

Why Running Infrastructure Still Goes Down

Railway operates across multiple cloud providers. Customer workloads run on Google Cloud, AWS, and Railway’s own hardware (Railway Metal). On paper, this looks like meaningful redundancy. In practice, the May 19 incident revealed a single point of failure that made that redundancy irrelevant.

Every request to a Railway-hosted application is routed through edge proxies, servers that sit in front of customer workloads and direct traffic to the right destination. Those proxies need to know where each workload lives. That information comes from a routing control plane hosted on Google Cloud. When GCP suspended Railway’s account, the control plane became unavailable. Cached routing data kept the proxies working briefly, but roughly 35 minutes later the cache expired. From that point, every workload, including those on AWS and Railway Metal that were physically healthy, began returning 404 errors. There was no route to reach them.

The lesson is counterintuitive: distributing where your workloads run is not the same as distributing how traffic reaches them. Railway had multi-cloud compute. It did not have a multi-cloud control plane. One automated GCP action, incorrectly triggered, was enough to make the distinction irrelevant.

A secondary problem compounded the outage: the volume of failed login attempts and retries caused GitHub to rate-limit Railway’s authentication and build integrations, blocking users from logging in or triggering deployments even as other services came back online.

Skynethosting Took Its Entire cPanel Fleet Offline on May 1. Two Weeks Later, Some Are Still Down.

How the Eight Hours Unfolded

The sequence was fast at the start and slow at the end. Automated monitoring flagged failures at 22:10 UTC. Root cause, the GCP account suspension, was identified nine minutes later at 22:19. An emergency support ticket was filed at 22:22, and GCP restored account access at 22:29, seven minutes later.

What followed was a staged, hours-long process of bringing systems back online without overwhelming infrastructure that had been abruptly shut down:

23:54 UTC: all storage volumes restored
01:30 UTC (May 20): compute and networking recovered
02:55 UTC: dashboard accessible
03:59 UTC: deployments processing again
07:58 UTC: incident fully resolved

Terms-of-service acceptance records were reset during recovery, requiring users to re-accept on next login.

What Railway Is Changing

Railway published a post-incident report acknowledging the architectural gap directly. The committed changes are: remove the hard dependency on the GCP-hosted routing control plane so traffic can be directed independently of any single cloud provider; extend database redundancy across AWS and Railway Metal; and remove Google Cloud from the critical path for live traffic entirely, keeping it only as a secondary or failover resource.

Railway’s report concluded: “Your customers don’t care whether the failure was Google or Railway; they see your product.”

The broader business risk surfaced by this incident applies beyond Railway. Any company running multi-cloud infrastructure, or relying on a platform that does, should ask a direct question: if your primary cloud provider’s account was suspended tomorrow, how long would it take for traffic to route around it? The Railway incident shows that the answer depends less on how many clouds you run on and more on which cloud controls the layer that tells traffic where to go. That layer is often the last one to be made redundant.

Natalia Nowak

Exploring the web hosting industry through writing - panels, providers, and everything that runs behind the scenes.

Sources

Incident Report: May 19, 2026 GCP Account Outage - Railway (official)

You're sure to like it too

Verisign Finally Won .web. Unlike .com, No One Caps Its Price.

A Malware Campaign Is Using GitHub Actions to Hunt Unpatched cPanel Servers

August 2 Is Still Real: What the AI Act’s Big Date Means for Hosting After the Omnibus

An AWS Region Has Been Offline for Months, and the Cloud’s Address Is No Longer a Technicality