Uptime Monitoring

Uptime monitoring continuously checks whether your website and its critical services are reachable and responding within acceptable time limits. It verifies availability (HTTP 2xx/3xx vs. 4xx/5xx), performance (latency), and sometimes content (keyword checks) from multiple global locations to detect outages quickly and trigger alerts.

What it measures

Availability: Is the site responding with a healthy status code?
Latency: Time to connect, TLS handshake, TTFB, and full response time.
Content validity: Presence/absence of keywords or regex in the response.
Dependencies: DNS resolution, TLS/SSL validity and expiry, CDN edge reachability, origin health, APIs, and database/connectivity checks.
Transactional flows (synthetics): Can users sign in, add to cart, and checkout? (Scripted journeys.)

How it works (under the hood)

Probes run on a schedule (e.g., every 30–60s) from multiple regions.
Quorum logic (e.g., “2 of 3 locations failing”) reduces false positives.
Alert policies fire after N consecutive failures or threshold breaches.
Integrations push notifications to Slack/Teams/email/SMS/on-call.
Status aggregation drives dashboards and public status pages.

Best practices

Multi-region checks: Always test from at least 3 geographically distinct locations.
Layered health checks: Ping/TCP + HTTP(S) + keyword/content + synthetic flows for critical funnels.
Maintenance windows: Silence alerts during planned work; auto-unsilence on end.
Fail-open content checks: If you check for a keyword, ensure the page returns it reliably (avoid A/B tests or geo-variant content).
TLS & DNS monitoring: Watch certificate expiry/chain issues and DNS changes/propagation.
Alert routing & escalation: Start with Slack/email; escalate to SMS/call only when quorum + duration thresholds are met.
SLOs & error budgets: Define targets (e.g., 99.9%) and alert on burn-rate rather than single events.

KPIs & helpful math (per 30-day month)

99% uptime ⇒ max downtime ≈ 7h 12m
99.9% ⇒ ≈ 43m 12s
99.99% ⇒ ≈ 4m 19s
99.999% ⇒ ≈ 26s
Track MTTD/MTTR (mean time to detect/recover), p95/p99 latency, and % of successful synthetic runs.

Implementation tips for website support teams

WordPress/WooCommerce:
- Monitor GET / and a lightweight health endpoint (e.g., a custom /health route that checks DB & object cache quickly).
- Add keyword checks for expected text (e.g., site name) and a cart/checkout synthetic for revenue-critical paths.
- Watch wp-cron with a heartbeat (record last run; alert if stale).
- Monitor disk usage and PHP error rate via logs/APM; spikes often precede downtime.
CDN/Edge + Origin: Probe both the CDN URL and origin to isolate cache vs. origin failures.
Change management: Tie deployments to a status page notice and temporary alert dampening.

Common pitfalls

Relying on a single location (false positives from local network issues).
Treating “200 OK” as healthy when the page shows an error template—use keyword checks.
Alerting on every probe failure (flapping) instead of using quorums and consecutive-failure rules.
Forgetting renewals (TLS/SSL, domains) — set explicit expiry alerts.

Need Help Managing WordPress?

Keeping your site secure, fast, and up to date doesn’t have to be stressful. Explore our WordPress Support Plans and let our team handle updates, backups, and security while you focus on your business.

Learn more