ADR-0002: OCI IAM topology and Object Storage layout¶

Status: Accepted
Date: 2026-05-17
Deciders: cloud-architect, oci-expert, secops-agent, marnissi.investments

Context¶

The MGH OCI tenancy is fresh. Plan Part 1.3 already mandates the bootstrap topology (a non-root IAM user terraform-deployer in group terraform-admins with a manage all-resources in tenancy policy in root compartment, two compartments dev and prod, one state bucket mgh-tofu-state). That is sufficient to bootstrap Tofu, but it is not the right long-term IAM model — manage all-resources in tenancy is broader than any single workload should have, and it ties human, machine, and instance identities to the same group.

We need:

A separation between human operators, CI machine identity, and in-cloud workload identity so blast radius and rotation cadence differ per role.
Per-compartment policy scoping (in compartment dev, in compartment prod) so a leaked dev credential can't reach prod.
Resource principals for compute that calls OCI APIs (Object Storage backups, OCIR pulls, Email Delivery sends) — no instance-side API keys.
A bucket layout that separates state, backups, app assets, and logs — each with its own lifecycle policy.
Free-tier-only footprint. All Always-Free.

The full IAM model also belongs in code (Tofu-managed) so it is auditable, diffable, and reproducible — except for the bootstrap subset, which must be created manually before any Tofu run.

Decision¶

Two-phase identity bootstrap.

Phase A (manual, operator runs once in console; documented in infra/scripts/bootstrap-oci.md):

IAM user terraform-deployer in group terraform-admins.
Policy in root compartment: Allow group terraform-admins to manage all-resources in tenancy.
Compartments dev and prod under root.
Object Storage buckets mgh-tofu-state and mgh-backups (versioning ON) in prod compartment.
Customer Secret Key on the IAM user (for s3-compat state backend).
API Key on the IAM user (for ~/.oci/config).

Phase B (Tofu, in modules/oci-iam + modules/oci-objectstorage, called from envs/<env>/):

Groups (Tofu-managed)¶

Group	Members	Purpose
`ci-deployers`	CI machine user (per-env)	Manage compute + OCIR + Object Storage in the env's compartment only
`developers`	Human engineers	Read all in dev, read-only in prod (observability, no manage)
`auditors`	Security reviewers, `secops-agent` operator	Read-only across both compartments, including audit logs

Group terraform-admins from Phase A stays as-is; Phase B does not touch it.

Dynamic groups (Tofu-managed, resource principals)¶

Dynamic group	Matching rule	Purpose
`app-runtime-<env>`	`ALL { instance.compartment.id = '<env compartment OCID>' }`	Instances in compartment use resource principal to call OCI APIs; no instance-side API keys

Policies (Tofu-managed, per compartment)¶

Each compartment gets its own policy bundle, attached in the compartment (not in root) so a tenancy-wide policy doesn't get widened by mistake:

<env>-ci-deployer-policy:
Allow group ci-deployers to manage instance-family in compartment <env>
Allow group ci-deployers to manage object-family in compartment <env>
Allow group ci-deployers to manage repos in compartment <env> (OCIR)
Allow group ci-deployers to use email-family in compartment <env> (Email Delivery senders)
<env>-app-runtime-policy:
Allow dynamic-group app-runtime-<env> to read repos in compartment <env> (OCIR pull)
Allow dynamic-group app-runtime-<env> to manage objects in compartment <env> where target.bucket.name in {'mgh-backups', 'mgh-assets-<env>', 'mgh-logs-<env>'}
Allow dynamic-group app-runtime-<env> to use email-family in compartment <env>
<env>-developer-readonly-policy:
Allow group developers to read all-resources in compartment <env>
In prod: developers get read only, never manage or use.
<env>-auditor-policy:
Allow group auditors to inspect all-resources in compartment <env>
Allow group auditors to read audit-events in compartment <env>

Verbs follow inspect < read < use < manage; lowest verb that works wins.

Object Storage layout¶

Bucket	Compartment	Tier	Versioning	Lifecycle	Phase
`mgh-tofu-state`	`prod`	Standard	ON	keep versions 30d	Phase A (manual)
`mgh-backups`	`prod`	Standard	ON	delete >90d	Phase A (manual)
`mgh-assets-dev`	`dev`	Standard	ON	delete non-current >30d	Phase B (Tofu)
`mgh-assets-prod`	`prod`	Standard	ON	delete non-current >90d	Phase B (Tofu)
`mgh-logs-dev`	`dev`	Infrequent Access	OFF	delete >30d	Phase B (Tofu)
`mgh-logs-prod`	`prod`	Infrequent Access	OFF	delete >90d	Phase B (Tofu)

Phase A buckets are NOT imported into Tofu — chicken-and-egg with the state backend (state bucket holding its own definition is fragile), and mgh-backups is per-policy out of scope for cross-env management. Manual lifecycle for those two; Tofu owns assets + logs.

Consequences¶

Cost (delta vs free tier): $0. All buckets fit inside 20 GB Standard + 10 GB Infrequent free quotas; expected occupancy <1 GB for months. Versioning is free per-class; lifecycle deletes keep growth bounded. Policies and dynamic groups are free.
Operational surface: Two new groups + one dynamic group per env + four policies per env + four buckets total to track. CI user must be created per env and added to ci-deployers (one-time per env). Auth Tokens for OCIR + SMTP Credentials for Email Delivery are still user-generated artifacts (Tofu cannot mint them); operator-side, captured in rotation runbook.
Security posture: Strictly improved over Phase A alone. terraform-deployer keeps manage all for bootstrap recovery, but real workloads run through scoped per-compartment policies. Instance access to APIs uses resource principals — no static keys on the VPS. Developer access to prod is read-only.
Reversibility: All Phase B is in Tofu. To revisit, tofu state rm + tofu apply of a new policy set. Bucket renames require data copy; treat names as durable.
Migration path if we revisit: Splitting ci-deployers further (e.g., per-service: ci-deployers-backend, ci-deployers-frontend) is a policy bundle replacement, not a re-architecture. Same for adding read-secrets groups when OCI Vault is introduced.

Alternatives considered¶

Option	Why rejected
One group `everything` with `manage all in tenancy`	Phase A topology. Too broad for steady-state; a leaked dev credential reaches prod.
Per-user policies (no groups)	OCI policies bind groups, not users. Per-user means N copies of the same policy → drift.
Per-service groups from day one (`ci-deployers-backend`, `ci-deployers-frontend`, …)	Premature. One CI identity per env is enough until we have repo-specific differences worth scoping.
API keys on the VPS instead of resource principals	Static keys leak, rotate poorly. Resource principals are scoped by dynamic group + auto-rotated.
All buckets in one compartment	Loses the dev/prod blast-radius split. Trivial to keep separate.
OCI Vault for secret storage from day one	Defer. Vault has minor ongoing cost (~$0.06/key/mo) and the secret surface today (CF token, OCIR auth, SMTP cred, vault password) is small enough to manage in Ansible vault + env vars. Revisit if/when we have >10 secrets or multi-operator rotation.
*Public read on `mgh-assets-`**	Tempting for CDN serving, but CF Tunnel + CF caching means we never need public origin. Closed by default.