Skip to content

ADR-0007: Cloudflare Tunnel (cloudflared) as sole ingress path for all MGH hostnames

  • Status: Accepted
  • Date: 2026-05-17
  • Deciders: cloud-architect, cloudflare-expert, secops-agent, finops-agent, marnissi.investments

Context

The MGH stack mandates zero public ingress on OCI (no public load balancer, no open inbound NSG rules, no A records pointing at OCI IP addresses). All internet → application traffic must be brokered. Six hostnames are live or planned across two access tiers (public and internal-via-CF-Access): marnissi-holdings.com, www.marnissi-holdings.com, docs.marnissi-holdings.com, admin.marnissi-holdings.com, api.marnissi-holdings.com, and labs.marnissi-holdings.com. The existing modules/cloudflare-tunnel/ and roles/cloudflared/ slots in infra/CLAUDE.md are reserved but not yet populated. This ADR authorises the topology and commits the module contract so infra-agent can implement it.

Decision

Run one cloudflare_zero_trust_tunnel_cloudflared resource per environment (mgh-dev, mgh-prod) with config_src = "local". Ingress rules are NOT declared as a Tofu resource — they live in the cloudflared agent's config.yml rendered by Ansible (roles/cloudflared/templates/config.yml.j2) on the bootstrap OCI ARM A1 VPS. The agent reads the tunnel secret from ansible/group_vars/all/vault.yml. The Tofu cloudflare-tunnel module outputs the tunnel UUID; the sibling cloudflare-dns module (same PR) consumes that UUID as the CNAME target value for all proxied hostnames.

config_src = "local" is deliberate: it keeps a single source of truth for ingress (Ansible) and avoids split-brain between Tofu-managed cloudflare_zero_trust_tunnel_cloudflared_config and the local YAML.

Tunnel topology

Resource Dev name Prod name
cloudflare_tunnel mgh-dev mgh-prod
(no cloudflare_zero_trust_tunnel_cloudflared_config resource — config_src = "local")
Tunnel secret 32 random bytes, generated by CF API; sensitive = true in Tofu outputs same
VPS process cloudflared systemd service via roles/cloudflared/ same

Ingress rules (identical across envs; target ports are localhost on the VPS)

Hostname Local target Notes
marnissi-holdings.com http://localhost:3000 website/ Next.js container
docs.marnissi-holdings.com http://localhost:8000 docs/ MkDocs serve or container
admin.marnissi-holdings.com http://localhost:3001 admin-frontend/ React container, behind CF Access
api.marnissi-holdings.com http://localhost:8001 admin-backend/ FastAPI container, behind CF Access
labs.marnissi-holdings.com http://localhost:9000 labs/ experimental, behind CF Access
www.marnissi-holdings.com Handled by CF DNS Single Redirect (301 → apex); no tunnel ingress rule needed
catch-all http_status:404 Rejects any unmatched Host header

Output handoff to cloudflare-dns module

modules/cloudflare-tunnel/ exposes:

output "tunnel_id" {
  description = "Tunnel UUID — used as CNAME target value (<uuid>.cfargotunnel.com) by cloudflare-dns module."
  value       = cloudflare_tunnel.<env>.id
  sensitive   = false   # UUID is not a secret; the secret is the token, not the ID
}

modules/cloudflare-dns/ receives var.tunnel_id and constructs CNAME records of the form <hostname> CNAME <tunnel_id>.cfargotunnel.com for every proxied hostname. Both modules ship in the same PR (PR5).

Ansible handoff (out of scope for the Tofu module)

After tofu apply, the operator copies the tunnel secret (accessed via tofu output -json | jq .tunnel_secret.value — executed once by the operator, never logged) into ansible/group_vars/all/vault.yml under cloudflared_tunnel_secret. roles/cloudflared/ reads that variable to write /etc/cloudflared/config.yml and enable the systemd service. No Ansible detail is specified here; roles/cloudflared/ is the authoritative source.

Consequences

  • Cost (delta vs free tier): $0. Cloudflare Tunnel is free and unlimited for egress traffic from the VPS. No bandwidth or connection caps apply at current scale.
  • Operational surface:
  • Two tunnel resources to monitor (mgh-dev, mgh-prod). CF dashboard shows per-tunnel health.
  • Tunnel secret rotation: rotate by deleting and recreating the cloudflare_tunnel resource (Tofu taint + apply), updating vault.yml, and re-running roles/cloudflared/. New UUID is generated; DNS CNAMEs must be updated in the same operation. Rotation cadence: annually or on suspected compromise.
  • Tunnel names are immutable post-create: renaming forces resource recreation, new UUID, and full DNS update. Names mgh-dev and mgh-prod are therefore final.
  • On VPS failure or cloudflared process crash: the tunnel disconnects within ~30 s. All hostnames return Cloudflare 502 until cloudflared reconnects. Recovery is automatic on process restart (systemd Restart=on-failure).
  • Tunnel deletion cascades: deleting a cloudflare_tunnel resource removes all ingress rules registered under it. DNS CNAMEs pointing at the deleted UUID will 502 immediately.
  • Security posture:
  • tunnel_secret output is marked sensitive = true in Tofu; it will not appear in plan/apply terminal output or state inspection commands.
  • No Bash(tofu output) recipe should be included in any runbook that would echo the tunnel secret value — secops-agent enforces this on every infra PR.
  • Zero OCI inbound rules or public IPs are required. The NSG for the ARM VPS remains egress-only (HTTPS/443 outbound to Cloudflare edge). Attack surface on OCI is unchanged.
  • CF Access policies on admin.*, api.*, and labs.* are enforced by the cloudflare-access module (separate ADR if not yet written). This ADR does not change those policies.
  • Vendor dependency: Cloudflare Tunnel is a proprietary protocol. Outbound-only TLS to Cloudflare's QUIC/HTTP2 edge is the only path. If CF Tunnel is ever deprecated or unacceptable, see migration path below.
  • Migration path if we revisit: Replace cloudflared with Tailscale Funnel (zero-config alternative, same zero-public-ingress property) or provision an OCI public Load Balancer + public NSG rule + Let's Encrypt certificates, accepting the policy trade-off of exposing an OCI IP. Either swap is a modules/cloudflare-tunnel/ replacement plus an Ansible role change — no application code changes required.

Alternatives considered

Option Why rejected
OCI public Load Balancer with public NSG rule Directly violates the zero-public-ingress constraint. Exposes an OCI IP to the internet, adds NSG complexity, requires TLS certificate management outside Cloudflare.
SSH reverse tunnel (autossh / ssh -R) Not designed for production traffic brokering. No health checking, no multi-connection scaling, no CF edge caching or WAF integration. Not enterprise-grade.
ngrok Custom domains require a paid plan. Secret management (ngrok auth token) is similar in complexity to CF Tunnel. No integration with CF Access/WAF. Adds a second edge vendor for no benefit.
Tailscale Funnel Viable alternative and listed as the migration path. Rejected as primary because CF Tunnel integrates natively with CF Access, WAF, Email Routing, and the DNS zone already in Cloudflare — one vendor for the full edge stack reduces operational surface today.
One tunnel for all envs Complicates secret rotation (both envs down simultaneously) and breaks env isolation. Two tunnels preserve independent lifecycle.