Infrastructure Provisioning
Terraform IaC
Pivot uses Terraform for all cloud infra that supports it, via the
pivot-internal GitHub repository's connection to multiple Terraform Cloud
workspaces.
The Terraform GitHub integration allows us to plan on PRs and then apply to
the staging environment on pushes to the main branch.
Avoid using the Terraform CLI locally even on non-production environments as
Terraform Cloud provides a single source of truth for our Terraform apply
history.
Services Managed with Terraform
AWS
Each Pivot deployment is its own AWS account. We use Terraform for the creation of each account, but not for general billing/IAM configuration of the management account of the AWS organization itself.
Cloudflare
- DNS: The zone representing each domain was created manually as were some DNS records, so Terraform is only used to manage specific records.
- WAF: TBD
- Load Balancer: Cloudflare Load Balancer is used for the multi-tenant backend endpoints to support future use of multi-region. This is managed via Terraform.
- R2:
- Access: Cloudflare Access rules are configured manually.
Rootly
Rootly is used for incident management and our status page. Service disruptions detected through our monitoring systems are managed through the Rootly platform.
Terraform Environment and Application CI/CD
We use Terraform to deploy infrastructure into each AWS environment as well as
to deploy each new image version for ECS services. There is no technical
distinction made here; image version increments are performed with
terraform apply the same way infrastructure changes are. Therefore, our CI/CD
configuration needs to accomplish the following:
- Terraform configuration needs to be managed distinctly for our staging environment, multi-tenant production environment, and our single-tenant environment template. At the same time, our Terraform configuration should be as DRY as possible.
- We need to deploy to staging first, and only then deploy to the multiple production environments after an arbitrary amount of time for E2E testing against staging.
- Staging deployment needs to run whenever new image versions are pushed as well as whenever it is otherwise desired. Production deployments need to follow successful staging deployments without manual intervention for image version bumps, but also with flexibility for testing against the staging backend and for manual approval if more than just an image version was changed.
- Changes to a backend service's source code (or to the code of a library the
service depends on) in the
pivotrepo need to trigger a new Docker image to be built and pushed to ECR and then the Terraform file that defines the service's task definition needs to be updated with the new image tag (thereby triggering staging deployment via Terraform). - Each service should be deployed independently. A change to one should not trigger redeployment of all.
- CD workflow concurrency needs to be managed. A new run should not cancel an
old run, when it comes to building and pushing images and updating
.tffiles, because cancellation of in-progress creates a lot of idempotence and race condition complexity to consider.
To accomplish the above, we use three GitHub Actions workflows along with Terraform Cloud.
Workflow #1: deploy-docker-services.yml
The pivot repo has a single workflow that can generically, using a Matrix
strategy, determine which applications are affected by a given commit to main,
build a Docker image, and push it to our ECR repository. It uses the service
name determine where to find the Dockerfile to build and it uses the Git SHA
that triggered it to name the image. The service name is the ECR repository name
and the Git SHA is the tag. Once the image is pushed successfully, workflow #2
is triggered, passing in the Git SHA and service name.
Workflow #2: update-docker-image-tag.yml
We still need to update our Terraform config to use the new image tag. This
pivot-internal workflow takes in a service name and Git SHA (technically the
Git SHA could be any string as it is just used as an image tag) and uses those
values to find the relevant file using the expected path
libs/terraform/services/servicename/main.tf. This workflow simply updates the
image tag by editing the .tf file and making a commit to the main branch.
Workflow #3: deploy-terraform-backend.yml
We need a way to deploy to staging, run E2E tests against staging and then
deploy to production. Terraform Cloud does not provide such a pipelining
mechanism. Therefore, we 'manually' run terraform plan and terraform apply
for each Terraform Cloud workspace inside GitHub Actions with the Terraform CLI.
This workflow is triggered by any commit to main that modified .tf files
related to the AWS backend, including but not limited to those commits made
automatically by the prior workflow.
After deploying to staging and running tests against staging, it runs plan on
the multi-tenant production environment. Then, it loops through an array of all
the single-tenant workspace names that are stored in
apps/terraform-aws-single-backend/workspaces-array.json. For each value, we
run plan against the relevant Terraform Cloud workspace. (If the only change
is an image tag bump apply is also ran automatically.
Services Managed Manually
- Axiom (all environments write to one Axiom account)
- Terraform Cloud (each workspace is created manually)
- AWS IAM Identity Center (connection to JumpCloud and sub-account permissions per JumpCloud user group)
- Mux (all production environments write to one Mux account, as Mux doesn't provide regional storage, so there is no need to have Mux accounts per tenant)
- LiveKit Cloud (all production enviornments use their own LiveKit Cloud project, with its own API key)
- Sentry (All environments write to one Sentry account - there is no long term storage of customer data in Sentry.)
- Rootly
- Expo (single Expo account for our mobile app - no long term storage of customer data)
- Google Cloud (Sign in with Google API keys)
- PostHog (single PostHog account)
- OpenAI (single API account - no long term storage of customer data)
- AssemblyAI (single API account - no long term storage of customer data)
- Various SaaS apps that aren't production infrastructure themselves like Google Workspace and JumpCloud
Cloudflare Pages for Frontend Sites and PivotAdmin
We use Cloudflare Pages to host our websites and static frontend assets. These are deployed to a single production environment, as even single-tenant backends use them.
Cloudflare Pages is configured via Terraform, with its own Terraform Cloud
workspace. Each application is deployed to Cloudflare Pages via its GitHub
integration, including PR environments, for both the pivot and
pivot-internal repos.
-
Docs: Cloudflare Pages deploys the docs site for each PR and on push to main.
-
Marketing: Cloudflare Pages deploys the marketing site for each PR and on push to main.
-
Engbook: Cloudflare Pages deploys the EngBook for each PR and on push to main.
-
Web App: Cloudflare Pages deploys the frontend Pivot app (Expo web export) for each PR and on push to main.
-
PivotAdmin: Cloudflare Pages deploys the PivotAdmin internal tool for each PR and on push to main.
-
Storybook: Cloudflare Pages deploys Storybook for each PR and on push to main.
Supabase for PivotAdmin
Our Supabase project uses their GitHub integration with the pivot-internal
repository for branching and migrations.