Skip to content

ADR-0002: OCI IAM topology and Object Storage layout

  • Status: Accepted
  • Date: 2026-05-17
  • Deciders: cloud-architect, oci-expert, secops-agent, marnissi.investments

Context

The MGH OCI tenancy is fresh. Plan Part 1.3 already mandates the bootstrap topology (a non-root IAM user terraform-deployer in group terraform-admins with a manage all-resources in tenancy policy in root compartment, two compartments dev and prod, one state bucket mgh-tofu-state). That is sufficient to bootstrap Tofu, but it is not the right long-term IAM model — manage all-resources in tenancy is broader than any single workload should have, and it ties human, machine, and instance identities to the same group.

We need:

  • A separation between human operators, CI machine identity, and in-cloud workload identity so blast radius and rotation cadence differ per role.
  • Per-compartment policy scoping (in compartment dev, in compartment prod) so a leaked dev credential can't reach prod.
  • Resource principals for compute that calls OCI APIs (Object Storage backups, OCIR pulls, Email Delivery sends) — no instance-side API keys.
  • A bucket layout that separates state, backups, app assets, and logs — each with its own lifecycle policy.
  • Free-tier-only footprint. All Always-Free.

The full IAM model also belongs in code (Tofu-managed) so it is auditable, diffable, and reproducible — except for the bootstrap subset, which must be created manually before any Tofu run.

Decision

Two-phase identity bootstrap.

Phase A (manual, operator runs once in console; documented in infra/scripts/bootstrap-oci.md):

  • IAM user terraform-deployer in group terraform-admins.
  • Policy in root compartment: Allow group terraform-admins to manage all-resources in tenancy.
  • Compartments dev and prod under root.
  • Object Storage buckets mgh-tofu-state and mgh-backups (versioning ON) in prod compartment.
  • Customer Secret Key on the IAM user (for s3-compat state backend).
  • API Key on the IAM user (for ~/.oci/config).

Phase B (Tofu, in modules/oci-iam + modules/oci-objectstorage, called from envs/<env>/):

Groups (Tofu-managed)

Group Members Purpose
ci-deployers CI machine user (per-env) Manage compute + OCIR + Object Storage in the env's compartment only
developers Human engineers Read all in dev, read-only in prod (observability, no manage)
auditors Security reviewers, secops-agent operator Read-only across both compartments, including audit logs

Group terraform-admins from Phase A stays as-is; Phase B does not touch it.

Dynamic groups (Tofu-managed, resource principals)

Dynamic group Matching rule Purpose
app-runtime-<env> ALL { instance.compartment.id = '<env compartment OCID>' } Instances in compartment use resource principal to call OCI APIs; no instance-side API keys

Policies (Tofu-managed, per compartment)

Each compartment gets its own policy bundle, attached in the compartment (not in root) so a tenancy-wide policy doesn't get widened by mistake:

  • <env>-ci-deployer-policy:
  • Allow group ci-deployers to manage instance-family in compartment <env>
  • Allow group ci-deployers to manage object-family in compartment <env>
  • Allow group ci-deployers to manage repos in compartment <env> (OCIR)
  • Allow group ci-deployers to use email-family in compartment <env> (Email Delivery senders)
  • <env>-app-runtime-policy:
  • Allow dynamic-group app-runtime-<env> to read repos in compartment <env> (OCIR pull)
  • Allow dynamic-group app-runtime-<env> to manage objects in compartment <env> where target.bucket.name in {'mgh-backups', 'mgh-assets-<env>', 'mgh-logs-<env>'}
  • Allow dynamic-group app-runtime-<env> to use email-family in compartment <env>
  • <env>-developer-readonly-policy:
  • Allow group developers to read all-resources in compartment <env>
  • In prod: developers get read only, never manage or use.
  • <env>-auditor-policy:
  • Allow group auditors to inspect all-resources in compartment <env>
  • Allow group auditors to read audit-events in compartment <env>

Verbs follow inspect < read < use < manage; lowest verb that works wins.

Object Storage layout

Bucket Compartment Tier Versioning Lifecycle Phase
mgh-tofu-state prod Standard ON keep versions 30d Phase A (manual)
mgh-backups prod Standard ON delete >90d Phase A (manual)
mgh-assets-dev dev Standard ON delete non-current >30d Phase B (Tofu)
mgh-assets-prod prod Standard ON delete non-current >90d Phase B (Tofu)
mgh-logs-dev dev Infrequent Access OFF delete >30d Phase B (Tofu)
mgh-logs-prod prod Infrequent Access OFF delete >90d Phase B (Tofu)

Phase A buckets are NOT imported into Tofu — chicken-and-egg with the state backend (state bucket holding its own definition is fragile), and mgh-backups is per-policy out of scope for cross-env management. Manual lifecycle for those two; Tofu owns assets + logs.

Consequences

  • Cost (delta vs free tier): $0. All buckets fit inside 20 GB Standard + 10 GB Infrequent free quotas; expected occupancy <1 GB for months. Versioning is free per-class; lifecycle deletes keep growth bounded. Policies and dynamic groups are free.
  • Operational surface: Two new groups + one dynamic group per env + four policies per env + four buckets total to track. CI user must be created per env and added to ci-deployers (one-time per env). Auth Tokens for OCIR + SMTP Credentials for Email Delivery are still user-generated artifacts (Tofu cannot mint them); operator-side, captured in rotation runbook.
  • Security posture: Strictly improved over Phase A alone. terraform-deployer keeps manage all for bootstrap recovery, but real workloads run through scoped per-compartment policies. Instance access to APIs uses resource principals — no static keys on the VPS. Developer access to prod is read-only.
  • Reversibility: All Phase B is in Tofu. To revisit, tofu state rm + tofu apply of a new policy set. Bucket renames require data copy; treat names as durable.
  • Migration path if we revisit: Splitting ci-deployers further (e.g., per-service: ci-deployers-backend, ci-deployers-frontend) is a policy bundle replacement, not a re-architecture. Same for adding read-secrets groups when OCI Vault is introduced.

Alternatives considered

Option Why rejected
One group everything with manage all in tenancy Phase A topology. Too broad for steady-state; a leaked dev credential reaches prod.
Per-user policies (no groups) OCI policies bind groups, not users. Per-user means N copies of the same policy → drift.
Per-service groups from day one (ci-deployers-backend, ci-deployers-frontend, …) Premature. One CI identity per env is enough until we have repo-specific differences worth scoping.
API keys on the VPS instead of resource principals Static keys leak, rotate poorly. Resource principals are scoped by dynamic group + auto-rotated.
All buckets in one compartment Loses the dev/prod blast-radius split. Trivial to keep separate.
OCI Vault for secret storage from day one Defer. Vault has minor ongoing cost (~$0.06/key/mo) and the secret surface today (CF token, OCIR auth, SMTP cred, vault password) is small enough to manage in Ansible vault + env vars. Revisit if/when we have >10 secrets or multi-operator rotation.
Public read on mgh-assets-* Tempting for CDN serving, but CF Tunnel + CF caching means we never need public origin. Closed by default.