DDoS Mitigation

DDoS mitigation is the set of controls that keep your site available during volumetric (L3/4) and application-layer (L7) floods. It uses anycast scrubbing networks, CDN caching, WAF/rate limits, and origin hardening to absorb or deflect malicious traffic without blocking legitimate users.

Attack types & symptoms

  • Volumetric (L3/4): UDP floods, reflection/amplification (DNS/NTP/CLDAP/Memcached), SYN floods → link saturation, high PPS, dropped packets.
  • Protocol exhaustion: TCP connection/state exhaustion, TLS handshake floods → high CPU on load balancers.
  • Application (L7): HTTP GET/POST floods, cache-busting query storms, search endpoints spam, slowloris/slow POST → high origin RPS/CPU, timeouts, 5xx.
  • Multi-vector: Combines the above to evade single controls.

Mitigation stack (layered)

  • Upstream/scrubbing: Anycast edge scrubs volumetric floods before traffic reaches your network (BGP diversion/GRE tunnels or CDN proxy).
  • CDN edge: Cache static + anonymous HTML aggressively; enable tiered caching and origin shield to minimize hits under load.
  • WAF/bot management: Managed rules + anomaly scoring; custom rules per path; bot fingerprinting and behavior analysis; geo/ASN granularity.
  • Rate limiting/connection control: Token buckets per IP/ASN/path; connection & request burst caps; challenge after threshold (JS/CAPTCHA/Turnstile).
  • Origin hardening: Keep-alives, sane timeouts, worker limits, autoscaling where possible; isolate expensive endpoints; enable HTTP/3/2 where supported.
  • Access control: mTLS or IP allow-lists for admin, XML-RPC, and internal APIs; HMAC for webhooks.
  • Observability: Real-time dashboards (RPS/PPS, cache hit ratio, 4xx/5xx), structured logs with sampled request bodies (minus PII).

Playbook during an attack

  1. Identify vector(s) quickly: look at RPS, cache hit, status code mix, top paths, user-agents, ASNs.
  2. Enable “under attack” profile: tighten rate limits; move WAF to block mode for high-confidence rules; turn on JS/challenges for abusive paths.
  3. Raise cacheability: increase HTML TTL for anonymous pages; add stale-while-revalidate/stale-if-error; normalize query strings in cache key.
  4. Protect hotspots: throttle /wp-login.php, XML-RPC, search, and heavy filters; temporarily disable non-essential endpoints/features.
  5. Shield origin: route via origin shield/tiered cache; scale out application workers; clamp long-running requests.
  6. Communicate: Update your status page with impact and next update time; notify on-call/escalations.
  7. Post-incident: Keep rules in monitor for 24–48h; write a brief RCA; codify permanent protections.

WordPress / WooCommerce specifics

  • Login & XML-RPC: Rate-limit and/or challenge wp-login.php; disable or strictly limit XML-RPC (especially system.multicall).
  • Checkout & payment callbacks: Never challenge PSP return URLs or webhooks; allow-list provider IPs or verify HMACs.
  • Cache-busting: Strip irrelevant query params from cache keys; bypass cache only when WooCommerce session/cart cookies are present.
  • Search & filters: Cap requests/sec per IP; apply pagination/size limits; consider short TTL caching for popular queries.
  • Static fallback: Serve a lightweight maintenance/queue page at edge if origin CPU is saturated.

Configuration snippets (illustrative)

Rate-limit wp-login with Nginx

limit_req_zone $binary_remote_addr zone=login:10m rate=5r/m;
location = /wp-login.php {
  limit_req zone=login burst=10 nodelay;
  include fastcgi_params;
  # ...fastcgi_pass etc.
}

Block XML-RPC except allow-list

location = /xmlrpc.php {
  allow 203.0.113.0/24;  # your integrator
  deny all;
}

Best practices

  • Prepare preapproved rule sets and a one-click “under attack” mode.
  • Keep short TTLs for DNS/CDN config so changes propagate fast.
  • Separate admin on a distinct subdomain with stronger access controls.
  • Use per-path rate limits (login, search, API) rather than one global bucket.
  • Continuously test with synthetic load to validate no false positives at peak.

Common pitfalls

  • Challenging or blocking checkout/payment flows.
  • Over-broad country/IP blocks harming real users.
  • CDN caching disabled for HTML, pushing all load to origin.
  • No origin autoscaling or worker limits → CPU thrash and collapse.
  • Forgetting to revert emergency rules, causing silent conversion loss.

KPIs

  • Time to mitigate (detection → stable).
  • Origin RPS / CPU delta vs baseline during attack.
  • Cache hit ratio under attack (should increase).
  • Blocked vs allowed request ratio per rule (false positives <0.1% on critical flows).
  • Egress bandwidth from origin (should remain within capacity).