ADR-0004: OCI Container Registry repos, lifecycle policies, and runtime pull IAM¶
- Status: Accepted
- Date: 2026-05-17
- Deciders: cloud-architect, oci-expert, secops-agent, finops-agent, marnissi.investments
Context¶
App containers for admin-backend, admin-frontend, website, and docs must be stored in a registry. The MGH compute fleet runs Docker on the ARM A1 VPS (Ansible-orchestrated), and all instances pull images at deploy and restart time. The resource-principal dynamic group app-runtime-<env> already exists (ADR-0002 §Dynamic groups) and is the correct pull identity — no static credentials on the VPS.
OCI Container Registry (OCIR) is available Always-Free at 500 MB total storage per tenancy. That ceiling demands lifecycle discipline from day one: a typical FastAPI or Next.js image is 80–150 MB compressed; four apps with ten retained versions each sits at roughly 40–60% of quota at steady state. Without retention policies, a busy CI pipeline fills the quota within weeks.
OCIR Auth Tokens — the credential Docker uses for docker login — cannot be provisioned by Tofu. The oci_identity_auth_token resource requires a user OCID and is tightly coupled to user lifecycle; Tofu cannot create it without also owning the user, and the terraform-deployer user is a Phase A bootstrap artifact not managed by Tofu (ADR-0002). Auth token generation is therefore a manual Phase A step, documented in infra/scripts/bootstrap-oci.md §Step 8 (added by infra-agent in the module PR).
Decision¶
Provision one OCIR repository per deployable image using the Tofu module modules/oci-ocir/, instantiated from infra/envs/<env>/. Each repo lives in the env's compartment (not root) to preserve the per-compartment blast-radius boundary established in ADR-0002.
Repository catalogue¶
| Repository name | Image | Compartment | Phase |
|---|---|---|---|
mgh-admin-backend |
admin-backend FastAPI application |
<env> |
Phase B (Tofu) |
mgh-admin-frontend |
admin-frontend React application |
<env> |
Phase B (Tofu) |
mgh-website |
website Next.js public site |
<env> |
Phase B (Tofu) |
mgh-docs |
docs MkDocs site |
<env> |
Phase B (Tofu) |
mgh-docs is reserved even if docs ultimately deploys to CF Pages (see ADR Alternatives). Reserving costs nothing; re-creating a name later is not guaranteed.
Fully-qualified image reference format:
Where <namespace> is the tenancy's Object Storage namespace (already captured in Phase A per infra/scripts/bootstrap-oci.md §Step 5 and recorded in envs/<env>/terraform.tfvars as oci_objectstorage_namespace).
Lifecycle policy per repository¶
Tofu provider limitation (oci-expert finding). The oracle/oci ~> 6.0 Terraform provider does not expose an OCIR image retention resource. The full artifacts_* resource list in the provider covers container_repository, container_image_signature, container_configuration, and the generic (non-container) repository/generic_artifact — there is no oci_artifacts_container_image_lifecycle_policy or equivalent. The OCIR retention feature exists only in the OCI Console and the OCIR REST API.
Lifecycle therefore lands in two tiers instead of a single Tofu policy:
| Tier | Owner | Scope | Mechanism |
|---|---|---|---|
| 1 — console retention rule | Operator (manual, per env) | "Untagged images not versioned > 14 days → delete" with exempt list * to preserve all tagged images |
OCI Console → Registry → Repository → Retention Policy. One rule per repo, four repos per env. Documented in infra/scripts/bootstrap-oci.md §Step 9. |
| 2 — count-based prune (deferred) | GitHub Actions cron (future) | "Keep last 10 tagged versions by creation date" per repo | Nightly oci artifacts container image list + delete. Implementation deferred to a follow-up PR once CI is actively pushing — not load-bearing until the first repo approaches its share of the 500 MB ceiling. |
Why this split is acceptable. Tier 1 is a pull-time/version-time rule that the native OCIR retention engine supports. It addresses the dominant growth source (untagged dangling layers from rebuilt images) and runs continuously inside OCI. Tier 2 implements the true "last 10" count cap and only matters at sustained CI throughput; until that throughput exists, Tier 1 alone holds the footprint under the 500 MB ceiling because tagged images for four apps at ~50 MB each leave generous headroom for the first dozens of versions per repo.
At steady state with Tier 1 only and ~10–20 versions per repo (no count cap yet), the tenancy sits between 200 MB and 400 MB — within the 500 MB free-tier ceiling but trending toward the cliff. When the ledger shows >350 MB OCIR usage, ship Tier 2.
Migration path if Tofu adds a retention resource. If a future oracle/oci provider release adds oci_artifacts_container_image_lifecycle_policy (or equivalent), the Tier 1 rule is reproducible in code with one for_each over the 4 repos. The console rule is reversible (delete it before applying the Tofu version) and the policy state is auditable in the console regardless.
IAM policy extension¶
ADR-0002 already grants app-runtime-<env> the statement:
This statement is sufficient for docker pull. No new IAM statement is needed for the runtime. infra-agent verifies this statement is present in modules/oci-iam/ before closing the module PR.
ci-deployers already holds:
This covers docker push from CI and operator workstations. No change.
No IAM changes ship in this ADR's module PR. The modules/oci-ocir/ module creates repos and lifecycle policies only.
Auth token (manual)¶
OCIR requires Docker login using an OCI Auth Token as the password and <namespace>/terraform-deployer (or a dedicated CI user) as the username. Tofu cannot mint Auth Tokens. The manual generation procedure will be documented by infra-agent as infra/scripts/bootstrap-oci.md §Step 8. The rotation cadence for this token is every 6 months minimum; the rotation runbook entry for this token belongs in infra/scripts/rotate-secrets.md (to be added by infra-agent).
Auth token values must never appear in .tfvars, README files, ADR bodies, PR descriptions, or any source file. They are environment variables or Ansible vault entries at runtime.
Consequences¶
- Cost (delta vs free tier): $0. Four empty repos at create time. At steady state with lifecycle policies enforced, estimated 200 MB used (40% of 500 MB Always-Free OCIR quota). The 500 MB cliff maps to approximately $0.026/GB/month beyond free.
finops-agentheadroom delta: four repos added; bytes at create time = 0 MB; ledger entry updated byfinops-agentat PR time. - Operational surface: Four OCIR repos to monitor (lifecycle violations surfaced in OCI Console → Registry → Repositories). One new Phase A step (
bootstrap-oci.md§Step 8) for auth token generation. One new rotation runbook section ininfra/scripts/rotate-secrets.mdfor the OCIR auth token (6-month cadence). Themgh-docsrepo sits idle until either the docs container is built or the slot is confirmed redundant (CF Pages ADR). - Security posture: Resource-principal pull means the VPS holds no static OCIR credentials —
app-runtime-<env>authenticates via instance metadata, not a stored token. Push credentials (Auth Token) are used only from CI runners and operator workstations; they are scoped tomanage repos in compartment <env>by the existingci-deployerspolicy and never reach prod from a dev token. Per-compartment policy scoping from ADR-0002 remains intact — a compromised dev auth token cannot push to prod OCIR.secops-agentchecklist: (a)<env>-app-runtime-policyextension must be additive only — confirm nomanageoruseverb added, onlyread; (b) no auth token value appears in any.tfvars, README, ADR, or PR body; (c) repo names (mgh-admin-backend, etc.) contain no tenant-identifying information (namespace is not embedded in the name). - Reversibility: Repository creation is per-repo and reversible with
tofu apply. Tier 1 lifecycle (console rule) is adjustable in the console withouttofuinvolvement — it lives outside Tofu state by provider limitation, not by design choice. Tier 2 lifecycle, once implemented, will be a GitHub Actions cron — config changes are PRs. Deleting a repo permanently deletes all contained images; treat repo names as durable once CI is writing to them. Themodules/oci-ocir/module is orthogonal — swapping it for an external registry is a module replacement, not a cross-module rewrite. - Migration path if we revisit: If the 500 MB free-tier ceiling becomes a constraint before an OCI Infrequent Access storage class is available for OCIR, two options exist: (1) move heavy or infrequently-pulled images (e.g.,
mgh-docs) to GitHub Container Registry (ghcr.io), keeping only the hot-path images in OCIR; (2) reduce retention from 10 to 5 tagged versions per repo. Both adjustments are config changes, not re-platforms. If OCI ships an Infrequent Access tier for OCIR (not available as of 2026-05-17), evaluate enabling it for themgh-docsrepo first. OCIR auth token rotation is documented ininfra/scripts/rotate-secrets.mdonce that section is written.
Alternatives considered¶
| Option | Why rejected |
|---|---|
GitHub Container Registry (ghcr.io) |
Ties the registry to GitHub availability and GitHub token scopes. Resource-principal pull from OCI instances to an external registry still works (NAT Gateway egress via ADR-0003), but adds an external dependency and requires a GITHUB_TOKEN (or PAT) as a static secret on the VPS — undoing the no-static-creds posture. Rejected as primary registry; retained as overflow option in the migration path. |
| Docker Hub | Rate-limited on the free tier (100 pulls per 6 hours for anonymous; 200/6h for free accounts). Unacceptable for a prod deployment that restarts containers on the same image. |
One shared repo with tag prefixes (e.g., mgh-apps:backend-1.2.3) |
OCIR repos are flat — there are no subpaths or sub-repos within a single repo. Tag prefixes work syntactically but lifecycle policies operate at the repo level; you cannot express "keep 10 versions of backend-* and 10 versions of frontend-* separately" without per-repo policies. Per-app repos are the only structure that supports independent lifecycle control. |
| No lifecycle policy | The 500 MB Always-Free ceiling fills in weeks with four active CI pipelines pushing images on every merge. Without retention, the first over-quota push begins accruing ~$0.026/GB/month in storage charges. Rejected outright. |
Shared base-image repo (mgh-base-images) |
Premature. We do not yet have more than one image sharing a meaningful base layer worth caching separately. Revisit when we have three or more images with a common >20 MB layer worth deduplicating. |
| OCIR Infrequent Access storage tier from day one | Not available on OCI OCIR as of 2026-05-17. No action possible; noted as a future lever in the migration path. |
| All repos in root compartment | Loses the per-compartment blast-radius isolation established in ADR-0002. A dev CI push token would scope to root, capable of overwriting prod images. Rejected: per-compartment placement is non-negotiable. |