
Every time AWS has a major outage, my inbox lights up. Clients, prospects, even people I haven't spoken to in years, all asking some version of the same question: "Should we be running on multiple clouds?"
The honest answer is that it depends, and the panic is almost never a good starting point for that conversation.
Multi-cloud is a business decision. The organizations that treat it as a reaction to an outage rather than a deliberate architectural choice are the ones that end up spending 18 months building something more expensive, harder to operate, and no more resilient than what they started with. I've watched that pattern play out with enough clients to know exactly where it breaks.
Most multi-cloud migrations don't fail because workloads won't run on a second provider. They fail because the networking, security policies, deployment pipelines, and observability stack around those workloads doesn't translate. The VMs move, but the operational foundation underneath them fractures.
This article covers three things: when multi-cloud actually makes sense as a strategy (and when it doesn't), why most consulting engagements produce fragmented architectures that are painful to maintain, and what changes when you build on Kubernetes as the portable infrastructure layer from day one.
The first question I ask any client considering multi-cloud is one I ask at the start of every infrastructure engagement, “What problem are we actually solving?”
Multi-cloud isn't a strategy on its own. It's an implementation choice that should follow from a specific business requirement. When someone says "We need multi-cloud," the real conversation starts with why.
There are legitimate reasons to distribute workloads across providers. There are also situations where multi-cloud adds complexity without solving the actual problem. Knowing the difference before you commit resources is worth more than any migration plan.
Regulatory or data sovereignty requirements: Some industries and geographies mandate that specific workloads run in specific environments. If your compliance team says customer data for EU users must live on a European provider's infrastructure while your core platform runs on AWS, that's a multi-cloud requirement driven by external constraints rather than preference.
Best-of-breed service needs: GCP's ML and data analytics tooling is ahead of what AWS and Azure offer in certain areas. If your data science team needs BigQuery or Vertex AI but the rest of your infrastructure runs on AWS, there's a rational case for purpose-driven multi-cloud rather than forcing everything onto one provider.
Meaningful vendor lock-in risk at scale: If your annual cloud spend is large enough that a single provider has significant pricing leverage, distributing workloads gives you negotiating power. This applies to organizations spending millions annually, not teams running a few hundred instances.
M&A situations: When you acquire a company running production on Azure and your stack is on AWS, you're multi-cloud, whether you planned to be or not. Consolidation isn't always practical or worth the risk.
As a reaction to a single outage: Every provider has outages, and they always will. If one bad weekend is the entire basis for a multi-cloud initiative, the actual problem is availability design within your existing provider, not provider diversity. Improving redundancy, failover, and disaster recovery on a single cloud is almost always cheaper and less risky than standing up a second cloud environment.
When the team can barely manage one cloud well: This is the one I see most often, and it's the most dangerous. When I start with a new client, I ask them to walk me through their infrastructure. If the explanation is 60-70% confident and 30-40% guesswork, the system has been architected beyond the team's ability to reason about it. Adding a second provider to that environment doesn't improve resilience. It doubles the surface area of what's already poorly understood.
When the real problem is deployment speed, not provider diversity: Sometimes "We need multi-cloud" actually means "deployments take too long" or "We can't recover from incidents fast enough." Those are real problems, but they're infrastructure automation problems with better answers than spinning up a second cloud.
The pattern is remarkably consistent. An organization hires a cloud consultancy to help it go multi-cloud. The consultancy assesses the existing AWS environment, proposes migrating a subset of workloads to Azure or GCP, and executes a replatform or lift-and-shift. The workloads run on the second provider, and the engagement gets marked as successful.
Then the problems start, and they compound.
Each of these problems is manageable in isolation. The danger is that they arrive together, and they always do, because they share the same root cause: the consultancy framed multi-cloud as a workload migration problem instead of an infrastructure architecture problem. Move the application, make it run over there, and call it done—but nobody built a portable foundation underneath.
The biggest infrastructure mistakes don't happen because a team makes the wrong trade-off between cost and performance. They happen because teams assume the trade-offs you made today will still be the right ones in two years. A tightly-coupled AWS architecture works fine until the business needs a second provider, and then every shortcut becomes a roadblock.
If you're going to run workloads across multiple cloud providers, you need an abstraction layer that sits above provider-specific constructs. Without one, every provider becomes its own operational silo with its own tooling, policies, and team knowledge. Kubernetes provides that layer.
To be clear about the claim I’m making here, Kubernetes is not a universal solution to everything. I'm pointing at a specific architectural property: a Kubernetes deployment manifest, a service definition, a network policy, an ingress rule. These work the same way whether the underlying nodes run on AWS EKS, Azure AKS, or GCP GKE. That consistency is what makes multi-cloud operationally viable instead of operationally exhausting.
Here's what that changes in practice.
A GitOps workflow using ArgoCD or Flux reconciles desired state from a Git repository and deploys to any Kubernetes cluster regardless of which cloud it runs on. The team maintains one deployment pipeline, not two or three.
I come back to the same question I ask teams adopting GitOps for the first time: "If I had to destroy everything and rebuild from Git, what would that look like?" In a multi-cloud context, that question isn't hypothetical. It's the entire strategy. If your infrastructure can't be rebuilt declaratively from a single source of truth, it's not actually portable. It just happens to be running in two places.
Kubernetes NetworkPolicies and policy engines like OPA/Gatekeeper apply uniformly across clusters. Instead of manually mapping AWS security groups to Azure NSGs and watching them drift, the security layer lives in Kubernetes and travels with the workload. One policy definition, enforced everywhere.
A Prometheus and Grafana stack, or an OpenTelemetry pipeline scraping Kubernetes metrics, works identically across providers. One monitoring stack, one alerting framework, one set of dashboards.
We saw this play out dramatically with GameStop, where consolidating observability brought monitoring costs down from roughly $15,000 per month to around $800. Duplication is expensive. Unification through a common orchestration layer eliminates that cost.
If a workload is packaged as a container, defined in Kubernetes manifests, and deployed via GitOps, moving it between providers means updating a cluster target, not rewriting infrastructure code. That's the difference between multi-cloud as a strategic capability and multi-cloud as an ongoing headache.
Every multi-cloud service page on the internet describes the same four-step process: assess, plan, migrate, optimize. It tells you nothing about what actually happens during the engagement or how the consulting team makes decisions. Here's what the work looks like when the foundation is built right.
Before any migration planning, we determine whether multi-cloud is the right call at all.
This starts with the diagnostic I mentioned earlier: walk me through your infrastructure. We map the current architecture, dependencies, and operational patterns. Then, we identify which workloads have a legitimate business case for running on a second provider and which are better served by improving availability and resilience within the existing environment.
Some engagements end here with a recommendation against multi-cloud, and that's a valid outcome. A client who spends three weeks on an honest assessment and decides single-cloud with better disaster recovery is the right answer has saved themselves 12 months of unnecessary complexity. We'd rather give that recommendation than sell a migration the business doesn't need.
If multi-cloud is the right path, the first step isn't migrating workloads. It's building the portable infrastructure layer.
This means standardized Kubernetes cluster configurations across target providers, GitOps-driven deployment pipelines that work against any cluster, unified networking and security policies, and a consolidated observability stack. The foundation is the product of this phase, not a migration.
In my experience, a small, senior team moves through this faster than a large, mixed-experience group. Three engineers who've done this before and can make architectural decisions without committee approval will outpace a team of fifteen every time.
Workloads move one at a time, starting with the lowest-risk, highest-learning-value candidates.
Each migration validates the foundation layer:
Problems surface on low-stakes workloads before critical systems move. A workload that handles internal batch processing is a better first candidate than the one serving production API traffic to paying customers.
This phase also surfaces cost realities early. Egress charges, cross-cloud latency, and provider-specific pricing quirks become visible with real workloads running in real environments rather than in spreadsheet projections.
The engagement shouldn't create a permanent dependency on the consulting partner.
Your internal team needs to operate, troubleshoot, and extend the multi-cloud environment independently. That means pair programming during the migration phases so knowledge transfers continuously rather than in a big-bang handoff at the end. It means runbooks written by the people who built the system, covering the real failure modes they encountered, not a 200-page PDF generated from templates. And it means structured training on the specific tools and patterns in your environment, not generic Kubernetes certification prep.
The goal is that when the engagement ends, your team runs the environment with confidence. If they need to call us six months later for something, we want it to be because the business has a new requirement, not because the old migration is still causing problems.
Whether you work with us or someone else, these five questions will tell you quickly whether a consulting partner is thinking about your problem correctly.
1. "What's your recommendation framework for when multi-cloud isn't the right call?"
A partner who only recommends multi-cloud isn't consulting. They're selling scope. Look for a clear framework for evaluating when single-cloud with better availability design is the better answer. If they can't articulate when they'd talk you out of a multi-cloud engagement, they haven't thought about your problem deeply enough.
2. "How do you handle networking and security across providers?"
This is where most engagements quietly fall apart. If the answer is "We configure each provider's native tools separately," expect drift and growing maintenance overhead within months. You want a provider-agnostic approach, whether that's Kubernetes-native policies or another abstraction layer, that gives you one security model across all environments.
3. "What does your team look like?"
Ask about seniority, not headcount. Ask whether the people you're meeting in the sales process are the people who'll actually do the work. A team of three senior engineers who've built multi-cloud architectures before will deliver faster, with fewer issues, than a team of fifteen where only two people have relevant experience.
4. "What happens after the migration?"
Two answers should make you cautious. "We hand over documentation and leave" means your team will struggle with knowledge gaps the moment something unexpected happens. "We'll manage it indefinitely" means you've traded one dependency (a cloud provider) for another (a consulting firm). The right answer involves structured enablement with a defined transition period where your team progressively takes ownership.
5. "Can you show me an engagement where you recommended against multi-cloud?"
This is the trust test. A partner confident enough in their expertise to turn down work, or redirect a client toward a simpler solution, is a partner making decisions in your interest. If every engagement they describe resulted in a multi-cloud build, they're either remarkably lucky in their client mix or they're not being honest about when the approach doesn't fit.
Multi-cloud is a significant architectural decision, and there's no shortcut to evaluating whether it's the right one for your environment. If this article has given you a clearer framework for thinking about it, that's the point.
If you're at the stage where you want an engineer to look at your specific infrastructure and give you an honest assessment of whether multi-cloud makes sense, and how Kubernetes fits into the picture, we do no-obligation architecture assessments. We'll tell you what we think, including if we think the answer is to stay on a single provider and invest in better availability design instead.