DDoS Mitigation
DDoS mitigation is the set of controls that keep your site available during volumetric (L3/4) and application-layer (L7) floods. It uses anycast scrubbing networks, CDN caching, WAF/rate limits, and origin hardening to absorb or deflect malicious traffic without blocking legitimate users.
Attack types & symptoms
- Volumetric (L3/4): UDP floods, reflection/amplification (DNS/NTP/CLDAP/Memcached), SYN floods → link saturation, high PPS, dropped packets.
- Protocol exhaustion: TCP connection/state exhaustion, TLS handshake floods → high CPU on load balancers.
- Application (L7): HTTP GET/POST floods, cache-busting query storms, search endpoints spam, slowloris/slow POST → high origin RPS/CPU, timeouts, 5xx.
- Multi-vector: Combines the above to evade single controls.
Mitigation stack (layered)
- Upstream/scrubbing: Anycast edge scrubs volumetric floods before traffic reaches your network (BGP diversion/GRE tunnels or CDN proxy).
- CDN edge: Cache static + anonymous HTML aggressively; enable tiered caching and origin shield to minimize hits under load.
- WAF/bot management: Managed rules + anomaly scoring; custom rules per path; bot fingerprinting and behavior analysis; geo/ASN granularity.
- Rate limiting/connection control: Token buckets per IP/ASN/path; connection & request burst caps; challenge after threshold (JS/CAPTCHA/Turnstile).
- Origin hardening: Keep-alives, sane timeouts, worker limits, autoscaling where possible; isolate expensive endpoints; enable HTTP/3/2 where supported.
- Access control: mTLS or IP allow-lists for admin, XML-RPC, and internal APIs; HMAC for webhooks.
- Observability: Real-time dashboards (RPS/PPS, cache hit ratio, 4xx/5xx), structured logs with sampled request bodies (minus PII).
Playbook during an attack
- Identify vector(s) quickly: look at RPS, cache hit, status code mix, top paths, user-agents, ASNs.
- Enable “under attack” profile: tighten rate limits; move WAF to block mode for high-confidence rules; turn on JS/challenges for abusive paths.
- Raise cacheability: increase HTML TTL for anonymous pages; add
stale-while-revalidate
/stale-if-error
; normalize query strings in cache key. - Protect hotspots: throttle
/wp-login.php
, XML-RPC, search, and heavy filters; temporarily disable non-essential endpoints/features. - Shield origin: route via origin shield/tiered cache; scale out application workers; clamp long-running requests.
- Communicate: Update your status page with impact and next update time; notify on-call/escalations.
- Post-incident: Keep rules in monitor for 24–48h; write a brief RCA; codify permanent protections.
WordPress / WooCommerce specifics
- Login & XML-RPC: Rate-limit and/or challenge
wp-login.php
; disable or strictly limit XML-RPC (especiallysystem.multicall
). - Checkout & payment callbacks: Never challenge PSP return URLs or webhooks; allow-list provider IPs or verify HMACs.
- Cache-busting: Strip irrelevant query params from cache keys; bypass cache only when WooCommerce session/cart cookies are present.
- Search & filters: Cap requests/sec per IP; apply pagination/size limits; consider short TTL caching for popular queries.
- Static fallback: Serve a lightweight maintenance/queue page at edge if origin CPU is saturated.
Configuration snippets (illustrative)
Rate-limit wp-login with Nginx
limit_req_zone $binary_remote_addr zone=login:10m rate=5r/m;
location = /wp-login.php {
limit_req zone=login burst=10 nodelay;
include fastcgi_params;
# ...fastcgi_pass etc.
}
Block XML-RPC except allow-list
location = /xmlrpc.php {
allow 203.0.113.0/24; # your integrator
deny all;
}
Best practices
- Prepare preapproved rule sets and a one-click “under attack” mode.
- Keep short TTLs for DNS/CDN config so changes propagate fast.
- Separate admin on a distinct subdomain with stronger access controls.
- Use per-path rate limits (login, search, API) rather than one global bucket.
- Continuously test with synthetic load to validate no false positives at peak.
Common pitfalls
- Challenging or blocking checkout/payment flows.
- Over-broad country/IP blocks harming real users.
- CDN caching disabled for HTML, pushing all load to origin.
- No origin autoscaling or worker limits → CPU thrash and collapse.
- Forgetting to revert emergency rules, causing silent conversion loss.
KPIs
- Time to mitigate (detection → stable).
- Origin RPS / CPU delta vs baseline during attack.
- Cache hit ratio under attack (should increase).
- Blocked vs allowed request ratio per rule (false positives <0.1% on critical flows).
- Egress bandwidth from origin (should remain within capacity).