ADR-0002: OCI IAM topology and Object Storage layout¶
- Status: Accepted
- Date: 2026-05-17
- Deciders: cloud-architect, oci-expert, secops-agent, marnissi.investments
Context¶
The MGH OCI tenancy is fresh. Plan Part 1.3 already mandates the bootstrap topology (a non-root IAM user terraform-deployer in group terraform-admins with a manage all-resources in tenancy policy in root compartment, two compartments dev and prod, one state bucket mgh-tofu-state). That is sufficient to bootstrap Tofu, but it is not the right long-term IAM model — manage all-resources in tenancy is broader than any single workload should have, and it ties human, machine, and instance identities to the same group.
We need:
- A separation between human operators, CI machine identity, and in-cloud workload identity so blast radius and rotation cadence differ per role.
- Per-compartment policy scoping (
in compartment dev,in compartment prod) so a leaked dev credential can't reach prod. - Resource principals for compute that calls OCI APIs (Object Storage backups, OCIR pulls, Email Delivery sends) — no instance-side API keys.
- A bucket layout that separates state, backups, app assets, and logs — each with its own lifecycle policy.
- Free-tier-only footprint. All Always-Free.
The full IAM model also belongs in code (Tofu-managed) so it is auditable, diffable, and reproducible — except for the bootstrap subset, which must be created manually before any Tofu run.
Decision¶
Two-phase identity bootstrap.
Phase A (manual, operator runs once in console; documented in infra/scripts/bootstrap-oci.md):
- IAM user
terraform-deployerin groupterraform-admins. - Policy in root compartment:
Allow group terraform-admins to manage all-resources in tenancy. - Compartments
devandprodunder root. - Object Storage buckets
mgh-tofu-stateandmgh-backups(versioning ON) inprodcompartment. - Customer Secret Key on the IAM user (for s3-compat state backend).
- API Key on the IAM user (for
~/.oci/config).
Phase B (Tofu, in modules/oci-iam + modules/oci-objectstorage, called from envs/<env>/):
Groups (Tofu-managed)¶
| Group | Members | Purpose |
|---|---|---|
ci-deployers |
CI machine user (per-env) | Manage compute + OCIR + Object Storage in the env's compartment only |
developers |
Human engineers | Read all in dev, read-only in prod (observability, no manage) |
auditors |
Security reviewers, secops-agent operator |
Read-only across both compartments, including audit logs |
Group terraform-admins from Phase A stays as-is; Phase B does not touch it.
Dynamic groups (Tofu-managed, resource principals)¶
| Dynamic group | Matching rule | Purpose |
|---|---|---|
app-runtime-<env> |
ALL { instance.compartment.id = '<env compartment OCID>' } |
Instances in compartment use resource principal to call OCI APIs; no instance-side API keys |
Policies (Tofu-managed, per compartment)¶
Each compartment gets its own policy bundle, attached in the compartment (not in root) so a tenancy-wide policy doesn't get widened by mistake:
<env>-ci-deployer-policy:Allow group ci-deployers to manage instance-family in compartment <env>Allow group ci-deployers to manage object-family in compartment <env>Allow group ci-deployers to manage repos in compartment <env>(OCIR)Allow group ci-deployers to use email-family in compartment <env>(Email Delivery senders)<env>-app-runtime-policy:Allow dynamic-group app-runtime-<env> to read repos in compartment <env>(OCIR pull)Allow dynamic-group app-runtime-<env> to manage objects in compartment <env> where target.bucket.name in {'mgh-backups', 'mgh-assets-<env>', 'mgh-logs-<env>'}Allow dynamic-group app-runtime-<env> to use email-family in compartment <env><env>-developer-readonly-policy:Allow group developers to read all-resources in compartment <env>- In
prod: developers getreadonly, nevermanageoruse. <env>-auditor-policy:Allow group auditors to inspect all-resources in compartment <env>Allow group auditors to read audit-events in compartment <env>
Verbs follow inspect < read < use < manage; lowest verb that works wins.
Object Storage layout¶
| Bucket | Compartment | Tier | Versioning | Lifecycle | Phase |
|---|---|---|---|---|---|
mgh-tofu-state |
prod |
Standard | ON | keep versions 30d | Phase A (manual) |
mgh-backups |
prod |
Standard | ON | delete >90d | Phase A (manual) |
mgh-assets-dev |
dev |
Standard | ON | delete non-current >30d | Phase B (Tofu) |
mgh-assets-prod |
prod |
Standard | ON | delete non-current >90d | Phase B (Tofu) |
mgh-logs-dev |
dev |
Infrequent Access | OFF | delete >30d | Phase B (Tofu) |
mgh-logs-prod |
prod |
Infrequent Access | OFF | delete >90d | Phase B (Tofu) |
Phase A buckets are NOT imported into Tofu — chicken-and-egg with the state backend (state bucket holding its own definition is fragile), and mgh-backups is per-policy out of scope for cross-env management. Manual lifecycle for those two; Tofu owns assets + logs.
Consequences¶
- Cost (delta vs free tier): $0. All buckets fit inside 20 GB Standard + 10 GB Infrequent free quotas; expected occupancy <1 GB for months. Versioning is free per-class; lifecycle deletes keep growth bounded. Policies and dynamic groups are free.
- Operational surface: Two new groups + one dynamic group per env + four policies per env + four buckets total to track. CI user must be created per env and added to
ci-deployers(one-time per env). Auth Tokens for OCIR + SMTP Credentials for Email Delivery are still user-generated artifacts (Tofu cannot mint them); operator-side, captured in rotation runbook. - Security posture: Strictly improved over Phase A alone.
terraform-deployerkeepsmanage allfor bootstrap recovery, but real workloads run through scoped per-compartment policies. Instance access to APIs uses resource principals — no static keys on the VPS. Developer access to prod is read-only. - Reversibility: All Phase B is in Tofu. To revisit,
tofu state rm+tofu applyof a new policy set. Bucket renames require data copy; treat names as durable. - Migration path if we revisit: Splitting
ci-deployersfurther (e.g., per-service:ci-deployers-backend,ci-deployers-frontend) is a policy bundle replacement, not a re-architecture. Same for addingread-secretsgroups when OCI Vault is introduced.
Alternatives considered¶
| Option | Why rejected |
|---|---|
One group everything with manage all in tenancy |
Phase A topology. Too broad for steady-state; a leaked dev credential reaches prod. |
| Per-user policies (no groups) | OCI policies bind groups, not users. Per-user means N copies of the same policy → drift. |
Per-service groups from day one (ci-deployers-backend, ci-deployers-frontend, …) |
Premature. One CI identity per env is enough until we have repo-specific differences worth scoping. |
| API keys on the VPS instead of resource principals | Static keys leak, rotate poorly. Resource principals are scoped by dynamic group + auto-rotated. |
| All buckets in one compartment | Loses the dev/prod blast-radius split. Trivial to keep separate. |
| OCI Vault for secret storage from day one | Defer. Vault has minor ongoing cost (~$0.06/key/mo) and the secret surface today (CF token, OCIR auth, SMTP cred, vault password) is small enough to manage in Ansible vault + env vars. Revisit if/when we have >10 secrets or multi-operator rotation. |
Public read on mgh-assets-* |
Tempting for CDN serving, but CF Tunnel + CF caching means we never need public origin. Closed by default. |