Skip to content

ADR-0004: OCI Container Registry repos, lifecycle policies, and runtime pull IAM

  • Status: Accepted
  • Date: 2026-05-17
  • Deciders: cloud-architect, oci-expert, secops-agent, finops-agent, marnissi.investments

Context

App containers for admin-backend, admin-frontend, website, and docs must be stored in a registry. The MGH compute fleet runs Docker on the ARM A1 VPS (Ansible-orchestrated), and all instances pull images at deploy and restart time. The resource-principal dynamic group app-runtime-<env> already exists (ADR-0002 §Dynamic groups) and is the correct pull identity — no static credentials on the VPS.

OCI Container Registry (OCIR) is available Always-Free at 500 MB total storage per tenancy. That ceiling demands lifecycle discipline from day one: a typical FastAPI or Next.js image is 80–150 MB compressed; four apps with ten retained versions each sits at roughly 40–60% of quota at steady state. Without retention policies, a busy CI pipeline fills the quota within weeks.

OCIR Auth Tokens — the credential Docker uses for docker login — cannot be provisioned by Tofu. The oci_identity_auth_token resource requires a user OCID and is tightly coupled to user lifecycle; Tofu cannot create it without also owning the user, and the terraform-deployer user is a Phase A bootstrap artifact not managed by Tofu (ADR-0002). Auth token generation is therefore a manual Phase A step, documented in infra/scripts/bootstrap-oci.md §Step 8 (added by infra-agent in the module PR).

Decision

Provision one OCIR repository per deployable image using the Tofu module modules/oci-ocir/, instantiated from infra/envs/<env>/. Each repo lives in the env's compartment (not root) to preserve the per-compartment blast-radius boundary established in ADR-0002.

Repository catalogue

Repository name Image Compartment Phase
mgh-admin-backend admin-backend FastAPI application <env> Phase B (Tofu)
mgh-admin-frontend admin-frontend React application <env> Phase B (Tofu)
mgh-website website Next.js public site <env> Phase B (Tofu)
mgh-docs docs MkDocs site <env> Phase B (Tofu)

mgh-docs is reserved even if docs ultimately deploys to CF Pages (see ADR Alternatives). Reserving costs nothing; re-creating a name later is not guaranteed.

Fully-qualified image reference format:

eu-milan-1.ocir.io/<namespace>/<repo-name>:<tag>

Where <namespace> is the tenancy's Object Storage namespace (already captured in Phase A per infra/scripts/bootstrap-oci.md §Step 5 and recorded in envs/<env>/terraform.tfvars as oci_objectstorage_namespace).

Lifecycle policy per repository

Tofu provider limitation (oci-expert finding). The oracle/oci ~> 6.0 Terraform provider does not expose an OCIR image retention resource. The full artifacts_* resource list in the provider covers container_repository, container_image_signature, container_configuration, and the generic (non-container) repository/generic_artifact — there is no oci_artifacts_container_image_lifecycle_policy or equivalent. The OCIR retention feature exists only in the OCI Console and the OCIR REST API.

Lifecycle therefore lands in two tiers instead of a single Tofu policy:

Tier Owner Scope Mechanism
1 — console retention rule Operator (manual, per env) "Untagged images not versioned > 14 days → delete" with exempt list * to preserve all tagged images OCI Console → Registry → Repository → Retention Policy. One rule per repo, four repos per env. Documented in infra/scripts/bootstrap-oci.md §Step 9.
2 — count-based prune (deferred) GitHub Actions cron (future) "Keep last 10 tagged versions by creation date" per repo Nightly oci artifacts container image list + delete. Implementation deferred to a follow-up PR once CI is actively pushing — not load-bearing until the first repo approaches its share of the 500 MB ceiling.

Why this split is acceptable. Tier 1 is a pull-time/version-time rule that the native OCIR retention engine supports. It addresses the dominant growth source (untagged dangling layers from rebuilt images) and runs continuously inside OCI. Tier 2 implements the true "last 10" count cap and only matters at sustained CI throughput; until that throughput exists, Tier 1 alone holds the footprint under the 500 MB ceiling because tagged images for four apps at ~50 MB each leave generous headroom for the first dozens of versions per repo.

At steady state with Tier 1 only and ~10–20 versions per repo (no count cap yet), the tenancy sits between 200 MB and 400 MB — within the 500 MB free-tier ceiling but trending toward the cliff. When the ledger shows >350 MB OCIR usage, ship Tier 2.

Migration path if Tofu adds a retention resource. If a future oracle/oci provider release adds oci_artifacts_container_image_lifecycle_policy (or equivalent), the Tier 1 rule is reproducible in code with one for_each over the 4 repos. The console rule is reversible (delete it before applying the Tofu version) and the policy state is auditable in the console regardless.

IAM policy extension

ADR-0002 already grants app-runtime-<env> the statement:

Allow dynamic-group app-runtime-<env> to read repos in compartment <env>

This statement is sufficient for docker pull. No new IAM statement is needed for the runtime. infra-agent verifies this statement is present in modules/oci-iam/ before closing the module PR.

ci-deployers already holds:

Allow group ci-deployers to manage repos in compartment <env>

This covers docker push from CI and operator workstations. No change.

No IAM changes ship in this ADR's module PR. The modules/oci-ocir/ module creates repos and lifecycle policies only.

Auth token (manual)

OCIR requires Docker login using an OCI Auth Token as the password and <namespace>/terraform-deployer (or a dedicated CI user) as the username. Tofu cannot mint Auth Tokens. The manual generation procedure will be documented by infra-agent as infra/scripts/bootstrap-oci.md §Step 8. The rotation cadence for this token is every 6 months minimum; the rotation runbook entry for this token belongs in infra/scripts/rotate-secrets.md (to be added by infra-agent).

Auth token values must never appear in .tfvars, README files, ADR bodies, PR descriptions, or any source file. They are environment variables or Ansible vault entries at runtime.

Consequences

  • Cost (delta vs free tier): $0. Four empty repos at create time. At steady state with lifecycle policies enforced, estimated 200 MB used (40% of 500 MB Always-Free OCIR quota). The 500 MB cliff maps to approximately $0.026/GB/month beyond free. finops-agent headroom delta: four repos added; bytes at create time = 0 MB; ledger entry updated by finops-agent at PR time.
  • Operational surface: Four OCIR repos to monitor (lifecycle violations surfaced in OCI Console → Registry → Repositories). One new Phase A step (bootstrap-oci.md §Step 8) for auth token generation. One new rotation runbook section in infra/scripts/rotate-secrets.md for the OCIR auth token (6-month cadence). The mgh-docs repo sits idle until either the docs container is built or the slot is confirmed redundant (CF Pages ADR).
  • Security posture: Resource-principal pull means the VPS holds no static OCIR credentials — app-runtime-<env> authenticates via instance metadata, not a stored token. Push credentials (Auth Token) are used only from CI runners and operator workstations; they are scoped to manage repos in compartment <env> by the existing ci-deployers policy and never reach prod from a dev token. Per-compartment policy scoping from ADR-0002 remains intact — a compromised dev auth token cannot push to prod OCIR. secops-agent checklist: (a) <env>-app-runtime-policy extension must be additive only — confirm no manage or use verb added, only read; (b) no auth token value appears in any .tfvars, README, ADR, or PR body; (c) repo names (mgh-admin-backend, etc.) contain no tenant-identifying information (namespace is not embedded in the name).
  • Reversibility: Repository creation is per-repo and reversible with tofu apply. Tier 1 lifecycle (console rule) is adjustable in the console without tofu involvement — it lives outside Tofu state by provider limitation, not by design choice. Tier 2 lifecycle, once implemented, will be a GitHub Actions cron — config changes are PRs. Deleting a repo permanently deletes all contained images; treat repo names as durable once CI is writing to them. The modules/oci-ocir/ module is orthogonal — swapping it for an external registry is a module replacement, not a cross-module rewrite.
  • Migration path if we revisit: If the 500 MB free-tier ceiling becomes a constraint before an OCI Infrequent Access storage class is available for OCIR, two options exist: (1) move heavy or infrequently-pulled images (e.g., mgh-docs) to GitHub Container Registry (ghcr.io), keeping only the hot-path images in OCIR; (2) reduce retention from 10 to 5 tagged versions per repo. Both adjustments are config changes, not re-platforms. If OCI ships an Infrequent Access tier for OCIR (not available as of 2026-05-17), evaluate enabling it for the mgh-docs repo first. OCIR auth token rotation is documented in infra/scripts/rotate-secrets.md once that section is written.

Alternatives considered

Option Why rejected
GitHub Container Registry (ghcr.io) Ties the registry to GitHub availability and GitHub token scopes. Resource-principal pull from OCI instances to an external registry still works (NAT Gateway egress via ADR-0003), but adds an external dependency and requires a GITHUB_TOKEN (or PAT) as a static secret on the VPS — undoing the no-static-creds posture. Rejected as primary registry; retained as overflow option in the migration path.
Docker Hub Rate-limited on the free tier (100 pulls per 6 hours for anonymous; 200/6h for free accounts). Unacceptable for a prod deployment that restarts containers on the same image.
One shared repo with tag prefixes (e.g., mgh-apps:backend-1.2.3) OCIR repos are flat — there are no subpaths or sub-repos within a single repo. Tag prefixes work syntactically but lifecycle policies operate at the repo level; you cannot express "keep 10 versions of backend-* and 10 versions of frontend-* separately" without per-repo policies. Per-app repos are the only structure that supports independent lifecycle control.
No lifecycle policy The 500 MB Always-Free ceiling fills in weeks with four active CI pipelines pushing images on every merge. Without retention, the first over-quota push begins accruing ~$0.026/GB/month in storage charges. Rejected outright.
Shared base-image repo (mgh-base-images) Premature. We do not yet have more than one image sharing a meaningful base layer worth caching separately. Revisit when we have three or more images with a common >20 MB layer worth deduplicating.
OCIR Infrequent Access storage tier from day one Not available on OCI OCIR as of 2026-05-17. No action possible; noted as a future lever in the migration path.
All repos in root compartment Loses the per-compartment blast-radius isolation established in ADR-0002. A dev CI push token would scope to root, capable of overwriting prod images. Rejected: per-compartment placement is non-negotiable.