ADR-0007: Cloudflare Tunnel (cloudflared) as sole ingress path for all MGH hostnames¶
- Status: Accepted
- Date: 2026-05-17
- Deciders: cloud-architect, cloudflare-expert, secops-agent, finops-agent, marnissi.investments
Context¶
The MGH stack mandates zero public ingress on OCI (no public load balancer, no open inbound NSG rules, no A records pointing at OCI IP addresses). All internet → application traffic must be brokered. Six hostnames are live or planned across two access tiers (public and internal-via-CF-Access): marnissi-holdings.com, www.marnissi-holdings.com, docs.marnissi-holdings.com, admin.marnissi-holdings.com, api.marnissi-holdings.com, and labs.marnissi-holdings.com. The existing modules/cloudflare-tunnel/ and roles/cloudflared/ slots in infra/CLAUDE.md are reserved but not yet populated. This ADR authorises the topology and commits the module contract so infra-agent can implement it.
Decision¶
Run one cloudflare_zero_trust_tunnel_cloudflared resource per environment (mgh-dev, mgh-prod) with config_src = "local". Ingress rules are NOT declared as a Tofu resource — they live in the cloudflared agent's config.yml rendered by Ansible (roles/cloudflared/templates/config.yml.j2) on the bootstrap OCI ARM A1 VPS. The agent reads the tunnel secret from ansible/group_vars/all/vault.yml. The Tofu cloudflare-tunnel module outputs the tunnel UUID; the sibling cloudflare-dns module (same PR) consumes that UUID as the CNAME target value for all proxied hostnames.
config_src = "local" is deliberate: it keeps a single source of truth for ingress (Ansible) and avoids split-brain between Tofu-managed cloudflare_zero_trust_tunnel_cloudflared_config and the local YAML.
Tunnel topology
| Resource | Dev name | Prod name |
|---|---|---|
cloudflare_tunnel |
mgh-dev |
mgh-prod |
(no cloudflare_zero_trust_tunnel_cloudflared_config resource — config_src = "local") |
— | — |
| Tunnel secret | 32 random bytes, generated by CF API; sensitive = true in Tofu outputs |
same |
| VPS process | cloudflared systemd service via roles/cloudflared/ |
same |
Ingress rules (identical across envs; target ports are localhost on the VPS)
| Hostname | Local target | Notes |
|---|---|---|
marnissi-holdings.com |
http://localhost:3000 |
website/ Next.js container |
docs.marnissi-holdings.com |
http://localhost:8000 |
docs/ MkDocs serve or container |
admin.marnissi-holdings.com |
http://localhost:3001 |
admin-frontend/ React container, behind CF Access |
api.marnissi-holdings.com |
http://localhost:8001 |
admin-backend/ FastAPI container, behind CF Access |
labs.marnissi-holdings.com |
http://localhost:9000 |
labs/ experimental, behind CF Access |
www.marnissi-holdings.com |
— | Handled by CF DNS Single Redirect (301 → apex); no tunnel ingress rule needed |
| catch-all | http_status:404 |
Rejects any unmatched Host header |
Output handoff to cloudflare-dns module
modules/cloudflare-tunnel/ exposes:
output "tunnel_id" {
description = "Tunnel UUID — used as CNAME target value (<uuid>.cfargotunnel.com) by cloudflare-dns module."
value = cloudflare_tunnel.<env>.id
sensitive = false # UUID is not a secret; the secret is the token, not the ID
}
modules/cloudflare-dns/ receives var.tunnel_id and constructs CNAME records of the form <hostname> CNAME <tunnel_id>.cfargotunnel.com for every proxied hostname. Both modules ship in the same PR (PR5).
Ansible handoff (out of scope for the Tofu module)
After tofu apply, the operator copies the tunnel secret (accessed via tofu output -json | jq .tunnel_secret.value — executed once by the operator, never logged) into ansible/group_vars/all/vault.yml under cloudflared_tunnel_secret. roles/cloudflared/ reads that variable to write /etc/cloudflared/config.yml and enable the systemd service. No Ansible detail is specified here; roles/cloudflared/ is the authoritative source.
Consequences¶
- Cost (delta vs free tier): $0. Cloudflare Tunnel is free and unlimited for egress traffic from the VPS. No bandwidth or connection caps apply at current scale.
- Operational surface:
- Two tunnel resources to monitor (
mgh-dev,mgh-prod). CF dashboard shows per-tunnel health. - Tunnel secret rotation: rotate by deleting and recreating the
cloudflare_tunnelresource (Tofutaint+apply), updatingvault.yml, and re-runningroles/cloudflared/. New UUID is generated; DNS CNAMEs must be updated in the same operation. Rotation cadence: annually or on suspected compromise. - Tunnel names are immutable post-create: renaming forces resource recreation, new UUID, and full DNS update. Names
mgh-devandmgh-prodare therefore final. - On VPS failure or
cloudflaredprocess crash: the tunnel disconnects within ~30 s. All hostnames return Cloudflare 502 untilcloudflaredreconnects. Recovery is automatic on process restart (systemdRestart=on-failure). - Tunnel deletion cascades: deleting a
cloudflare_tunnelresource removes all ingress rules registered under it. DNS CNAMEs pointing at the deleted UUID will 502 immediately. - Security posture:
tunnel_secretoutput is markedsensitive = truein Tofu; it will not appear in plan/apply terminal output or state inspection commands.- No
Bash(tofu output)recipe should be included in any runbook that would echo the tunnel secret value —secops-agentenforces this on every infra PR. - Zero OCI inbound rules or public IPs are required. The NSG for the ARM VPS remains egress-only (HTTPS/443 outbound to Cloudflare edge). Attack surface on OCI is unchanged.
- CF Access policies on
admin.*,api.*, andlabs.*are enforced by thecloudflare-accessmodule (separate ADR if not yet written). This ADR does not change those policies. - Vendor dependency: Cloudflare Tunnel is a proprietary protocol. Outbound-only TLS to Cloudflare's QUIC/HTTP2 edge is the only path. If CF Tunnel is ever deprecated or unacceptable, see migration path below.
- Migration path if we revisit: Replace
cloudflaredwith Tailscale Funnel (zero-config alternative, same zero-public-ingress property) or provision an OCI public Load Balancer + public NSG rule + Let's Encrypt certificates, accepting the policy trade-off of exposing an OCI IP. Either swap is amodules/cloudflare-tunnel/replacement plus an Ansible role change — no application code changes required.
Alternatives considered¶
| Option | Why rejected |
|---|---|
| OCI public Load Balancer with public NSG rule | Directly violates the zero-public-ingress constraint. Exposes an OCI IP to the internet, adds NSG complexity, requires TLS certificate management outside Cloudflare. |
| SSH reverse tunnel (autossh / ssh -R) | Not designed for production traffic brokering. No health checking, no multi-connection scaling, no CF edge caching or WAF integration. Not enterprise-grade. |
| ngrok | Custom domains require a paid plan. Secret management (ngrok auth token) is similar in complexity to CF Tunnel. No integration with CF Access/WAF. Adds a second edge vendor for no benefit. |
| Tailscale Funnel | Viable alternative and listed as the migration path. Rejected as primary because CF Tunnel integrates natively with CF Access, WAF, Email Routing, and the DNS zone already in Cloudflare — one vendor for the full edge stack reduces operational surface today. |
| One tunnel for all envs | Complicates secret rotation (both envs down simultaneously) and breaks env isolation. Two tunnels preserve independent lifecycle. |