
Cloudflare Outage Readiness moved from theory to reality on November 18, 2025, when large parts of the web blinked at once. Sites and apps from X to ChatGPT saw hours of disruption before traffic stabilized. Cloudflare later explained that a massive configuration file triggered a software failure, not an attack. The outage is over; the lesson is not.
Why leaders should act now
One provider can be a single point of failure for traffic, security, and DNS. A resilient runtime spreads risk across providers so a fault in one path does not stop you from serving users. Outages at core internet platforms are rare, but they move markets and momentum when they happen.
Cloudflare Outage Readiness Goal
Keep pages up and transactions flowing during a provider incident using two levers: multi-CDN delivery and DNS failover with health checks.
Multi-CDN in plain English
Use more than one CDN so traffic can shift if one path slows or fails. A good setup steers users by health and performance and can reach five-nine availability when built well.
How to design it
- Pick two CDNs with different networks and POP footprints.
- Keep the same cache keys, headers, and rules on both to avoid surprises.
- Put both in front of the same origin group or use two origins behind a load balancer.
- Enable health-based steering so user traffic flows to the healthy CDN in each region.
Config parity checklist
Caching rules, compression, image transforms, TLS versions, WAF rules, bot rules, edge redirects, WebSockets, HTTP/2 or HTTP/3, and any signed URL logic.
DNS failover that works
Authoritative DNS decides where users go. Failover updates DNS answers when health checks see an outage. Short TTLs make changes take effect faster.
Build the layer
- Use a DNS provider with native health checks and failover.
- Set TTLs in the 60–300 second range for records that may move.
- Point www to a traffic steering record that can pick CDN A or CDN B.
- Avoid circular ties. If CDN A also hosts your DNS, keep a second DNS provider to serve answers if CDN A is impaired.
Monitoring
Run external probes from more than one network to confirm both CDNs and your origins are healthy. Keep alerts simple and fast.
30-60-90 day rollout
1.Days 1–30
- Map critical paths: home, login, checkout, APIs.
- Choose two CDNs and a DNS provider with health checks.
- Align cache rules and headers on both CDNs.
- Set www to a steering record. TTL 60–300 seconds.
2.Days 31–60
- Add active monitors from multiple regions.
- Load test both CDNs against your origin.
- Confirm logs show cache hits and no broken headers.
- Pilot failover on a low-risk subdomain.
3.Days 61–90
- Run a live failover drill during a quiet window.
- Document rollback and comms.
- Extend to APIs, images, and downloads.
- Add a quarterly test to your calendar.
Common pitfalls that break failover
- One CDN plus one DNS vendor from the same company.
- Long TTLs on moving records.
- Different cache rules across CDNs.
- No health checks from outside your providers.
- Not testing in production-like traffic.
What to test before you call it done
- Cache hit rate and origin load on both CDNs.
- Session continuity during failover.
- API timeouts and retry logic.
- Status page reachability on a separate provider.
- Rollback in under five minutes.
Lessons from recent incidents
Cloudflare’s November 18 outage tied to a config file bug shows how a single platform issue can ripple across the web. Cloudflare has had other incidents this year and publishes technical details and fixes after they occur, which helps teams plan guardrails. The best time to build a second path is before you need it.
Quick template you can copy
Traffic: DNS steering to CDN A or CDN B with health checks
CDNs: same cache rules, same redirects, same TLS
Origins: two regions, read replicas where needed
Monitoring: independent probes and log alerts
Drill: once a quarter, record results and fixes
Want help mapping this for your stack?
Centrend can pair with your team to design a simple multi-CDN and DNS failover plan, test it, and hand you a runbook you can keep.