ADR-0003: OCI VCN + subnets + NSGs codified in Tofu for the future-fleet network¶
- Status: Accepted
- Date: 2026-05-17
- Deciders: cloud-architect, oci-expert, secops-agent, marnissi.investments
Context¶
The MGH OCI tenancy in eu-milan-1 runs one ARM A1 VPS provisioned manually through the OCI console, with its own ad-hoc VCN, subnet, and security list. That bootstrap VPS is Ansible-owned and is not imported into Tofu (see infra/CLAUDE.md §Stack and ADR-0002). The bootstrap network is therefore uncodified and cannot be reproduced cleanly.
The next compute instance must be a tofu apply away. To enable that, the target network topology must exist in code before the instance does. The design constraint is firm: zero public ingress on OCI — all traffic enters through Cloudflare Tunnel, which egresses outward from the VPS to Cloudflare. No instance holds a public IP in v1.
This ADR records the target VCN topology (greenfield, Tofu-managed), the NSG rules that enforce the zero-ingress posture, and the migration path for the bootstrap VPS onto the new network. The migration is not executed in this ADR's PR; it is recorded here so it can be followed in a later operator session.
Decision¶
Provision one Tofu-managed VCN per environment (dev, prod) in eu-milan-1, each with two subnets, an Internet Gateway, a NAT Gateway, and two NSGs. The module lives at infra/modules/oci-network/, instantiated from infra/envs/<env>/.
VCN and subnet topology¶
| Resource | Dev | Prod | Notes |
|---|---|---|---|
| VCN CIDR | 10.10.0.0/16 |
10.20.0.0/16 |
One VCN per env; no VCN peering between envs |
| Public subnet | 10.10.1.0/24 |
10.20.1.0/24 |
No instance public IPs in v1; reserved for OCI Load Balancer or other future edge resources |
| Private subnet | 10.10.2.0/24 |
10.20.2.0/24 |
All app VPS instances land here |
Route tables¶
| Route table | Attached to | Default route |
|---|---|---|
rt-public-<env> |
Public subnet | 0.0.0.0/0 → Internet Gateway |
rt-private-<env> |
Private subnet | 0.0.0.0/0 → NAT Gateway |
The public subnet routes to the IGW so that an OCI Load Balancer placed there later can acquire an ephemeral public IP. The private subnet routes to the NAT Gateway for outbound-only connectivity (container image pulls, OCI API calls, Cloudflare Tunnel keepalives).
NSG rules¶
Two NSGs per env. Every compute instance in the fleet receives both.
nsg-app-<env> — application workload traffic
| Direction | Protocol | Source / Destination | Port | Purpose |
|---|---|---|---|---|
| Egress | All | 0.0.0.0/0 |
All | Outbound (Tunnel, container pulls, OCI APIs) |
| Ingress | — | — | — | None — no internet ingress; Tunnel egresses out |
nsg-ops-<env> — operator access
| Direction | Protocol | Source / Destination | Port | Purpose |
|---|---|---|---|---|
| Egress | All | 0.0.0.0/0 |
All | Outbound (unrestricted for ops tooling) |
| Ingress | TCP | var.operator_ip_cidr (/32) |
22 | SSH from operator IP only |
var.operator_ip_cidr is the only variable that controls SSH access. It must be a /32. There is no 0.0.0.0/0 ingress rule anywhere in this topology.
Gateways¶
| Gateway | Purpose |
|---|---|
Internet Gateway (igw-<env>) |
Attached to VCN; public subnet's default route points here |
NAT Gateway (nat-<env>) |
Private subnet's default route points here; instances in private subnet have outbound internet without a public IP |
Consequences¶
- Cost (delta vs free tier): $0. OCI VCN, subnets, Internet Gateway, NAT Gateway, route tables, and NSGs are all free-tier resources with no usage-based charges.
finops-agentheadroom delta: zero. - Operational surface: Two VCNs, four subnets, two IGWs, two NAT GWs, four NSGs, four route tables — all Tofu-managed and auditable in
infra/modules/oci-network/. The only mutable operator input isoperator_ip_cidrper env (stored inenvs/<env>/terraform.tfvars; never a secret, but treat as configuration not to be published). - Security posture: Zero public ingress is preserved and enforced by NSG rule — there is no inbound
0.0.0.0/0rule at any layer. SSH is gated to a single operator/32viansg-ops-<env>;var.operator_ip_cidris the sole blast radius for SSH exposure.secops-agentchecklist: verify no NSG rule containssource = 0.0.0.0/0with ingress direction; verifyoperator_ip_cidris set to a/32and not a wider CIDR; confirm no instance is placed in the public subnet with a public IP until an LB ADR explicitly approves it. - Reversibility: Fully Tofu-managed. To replace this topology (e.g., switch to a hub-and-spoke VCN model or add a DRG for peering),
tofu destroythe module and apply a replacement. No data lives in the network layer; the only unwind cost is instance downtime during NIC re-attachment. - Migration path for bootstrap VPS: The bootstrap VPS currently sits on the manually-created network. When the operator is ready to move it:
- Snapshot the boot volume in the OCI console.
- Terminate the instance (boot volume preserved).
- Run
tofu applyfor the target env to create the new VCN/subnet/NSGs if not already present. - Recreate the instance in
infra/modules/oci-compute-arm/pointing at the private subnet, attachingnsg-app-<env>andnsg-ops-<env>, and referencing the preserved boot volume OCID. - Re-run
ansible-playbook playbooks/bootstrap.ymlto re-registercloudflaredwith the Tunnel (the Tunnel connector ID changes on new instance). - Verify Tunnel is active and all hostnames resolve through CF before decommissioning the old network resources.
This migration is not executed in this ADR's PR.
Alternatives considered¶
| Option | Why rejected |
|---|---|
| Single flat subnet | No separation between app plane and ops plane. An NSG misconfiguration on one instance type would affect all. Two subnets (public reserved, private active) provides the isolation needed to add an LB later without redesign. |
| Two-AZ multi-subnet layout | OCI eu-milan-1 has one AD on Always-Free. Multi-AD layout adds complexity with no resilience benefit at free tier; defer to a paid-tier migration ADR if that changes. |
| Import bootstrap VPS network into Tofu | Technically feasible (tofu import). Rejected because the manually-created security list and subnet are not shaped to the target topology. A clean greenfield module is simpler to reason about and avoids accumulated console drift. The bootstrap VPS migrates onto the new network in a later session. |
| VCN peering between dev and prod | Increases blast radius — a misconfigured route or NSG rule in dev could reach prod. The two envs have no legitimate need to communicate directly. Reject peering; keep VCNs isolated. |
| Public IPs on instances | Violates the zero-public-ingress constraint that is foundational to the MGH security posture (all traffic via CF Tunnel). Public IPs would bypass WAF, CF Access, and the zero-trust perimeter. Rejected unconditionally in v1. |