Database and Data Store Guidelines

Each service has its own database, if needed. Services do not access each other's database tables. Generally, services do not access each other's object or file stores either, though this is a grey area in the sense that it is a common S3 pattern to provide pre-signed POST URLs for direct upload from clients.

Data Storage by Service

Some Pivot backend services are entirely stateless. The below services durably store data in external stores:

Pilot: Valkey (ElastiCache Serverless)
Rest: Amazon Keyspaces (Managed Cassandra)
Buzzbuzz: Amazon Keyspaces (Managed Cassandra)
Asimov: Amazon Keyspaces (Managed Cassandra)
Blobby: Amazon Keyspaces (Managed Cassandra), Mux, S3 (user content)
Stagehand: S3 (raw recordings)
Blockhead: Postgres (ORM: Ent), Amazon Keyspaces (Managed Cassandra) for BinaryBlockUpdates
Facebox: Postgres (ORM: Ent)
Messenger: Postgres (ORM: Ent)
Quest: Turbopuffer (Managed Vector DB)
Wallstreet: Postgres (ORM: Ent)

AWS Aurora PostgreSQL

We use AWS Aurora PostgreSQL with serverless configuration and global database features. Our databases are organized into three distinct clusters:

Facebox cluster:
- pivot_facebox
Blockhead cluster:
- pivot_blockhead
Messenger cluster:
- pivot_messenger

For development and testing:

Use a Postgres Docker image for local development
Use a Postgres Docker image for integration testing in CI
Run our E2E tests in our staging environment against Aurora PostgreSQL

Therefore, application migrations have to succeed against both local Postgres and Aurora PostgreSQL to make it to production.

ORM Best Practices

Services should address the following best practices when designing their database interaction:

Use Drizzle for Typescript and Ent for Go. Both balance developer experience and ease of use with solid runtime performance and idiomatic schema management.
Services are responsible for their own database migrations. Our best practice is to run migrations at service start-up. This delays start-up a bit, but is worthwhile to avoid leaking production database secrets to migration specific enviornments and avoid having a new application version running when the associated database migration has failed. Services should fail start-up heath-checks unless/until migrations have scceeded. Keep in mind that the migration library used needs to apply appropriate locks and transaction boundaries to handle concurrency during service deployment and migration application.
In CI on PRs (GitHub Actions), check to ensure that migrations do not conflict (if using Go, Ent provides the atlas.sum file for this) and verify that codegen is up to date (e.g., in Ent, go generate should not diff if the PR author ran it before committing).
Add database seed logic to the seeder app in the pivot repo. This app exists to 1) seed local dev environments and 2) serve as a base layer of tests in CI. By running seed script logic via the service's API, we optimize the degree to which seed data represents the actual write behavior and validation layer of the service.
Utitlize two environment variables, POSTGRES_WRITER_CONN and POSTGRES_READER_CONN. Expose this to your domain layer so that at a per query level, you can decide which endpoint to use, using the reader endpoint as much as possible.

Primary Key Selection and Indexing

With Aurora PostgreSQL, we follow standard PostgreSQL best practices for indexing and primary key selection. The use of composite primary keys and secondary indexes remains important for query performance optimization.

Our Go ORM Ent does not allow composite primary keys for 'nodes' (only edges), so we change the generated migration file for a new table such that the id column definition is exactly as Ent generates it to be, but the primary key is set to a composite ending with that id, but starting with some other column. As long as we configure Ent to require a value for each field and set all fields used in the composite primary key as immutable(), this works fine. (Our TypeScript ORM Drizzle allows composite primary keys on all tables, so there are no .sql manual edits needed.)

It is important that we consider secondary indexes when using a composite primary key. Aurora PostgreSQL automatically indexes the primary key (as a composite) however our point reads will use the id column alone, so we need to manually define an index on that column. Additionally, it is useful to manually define an index on any foreign key that we expect to often query based on. For example, the messages table in Messenger has a primary key based on the room_id and id columns, so we add indexes on both of those columns to enable efficient point reads and WHERE room_id = "123".

Turbopuffer

Turbopuffer is a managed, serverless vector database optimized for performance and scalability.

We use Turbopuffer for vector search capabilities within the Quest service. Quest handles indexing data into Turbopuffer and executing queries against it.

Amazon Keyspaces (Managed Cassandra)

Amazon Keyspaces is a scalable, highly available, and managed Apache Cassandra-compatible database service. We use Keyspaces for several services requiring NoSQL data storage, particularly where high write throughput and scalability are important.

Services using Keyspaces:

Asimov
Buzzbuzz
Rest
Blobby (metadata)
Stagehand
Blockhead (BinaryBlockUpdate table only)

AWS ElastiCache Serverless (Valkey)

We use AWS ElastiCache Serverless, running the Valkey engine (a fork of Redis), for in-memory data storage. It provides high performance caching and serves as the primary data store for the Pilot service.

Data Validation Practices

Create a validation system (zod if using TypeScript, validator for Go) that lets you refine acceptable field values beyond what the database itself can enforce and beyond what might be specified in applicable Protocol Buffer .proto API schema files. Run this validation before every insert and update, unless you are sure the specific insert/update is going to be within the parameters of the validation system.

Database Migration Principles

Whatever ORM/schema builder is used for creating database migration files for each service is likely very flexible, however we have some rules on database migrations that developers need to enforce when creating migrations and reviewing pull requests that include a new migration.

Migrations can contain:

Column added
Column dropped (we must verify no data loss and that no running applications are using this column)
Table added
Table dropped (we must verify no data loss and that no running applications are using this table)
Constraint on column dropped

Migrations cannot contain:

Renaming a table
Rename column
Change column data type
Constraint on column added

Additional rules:

Only one migration should be a part of a single PR (one file/entity changed/created). Reject a PR if it has multiple migrations, as this makes it difficult to assess the exact changes made in the PR.
Generated migration files should not be changed. Reject a PR if it does so.

Keep in mind, we must never attempt to synchronize the timing of database migration deploys and service version deploys. These will inherently be offset from each other, even though we run migrations at service start-up. Always assume old image versions will run with new dataase schema and new image versions will run with old schema.

Local Development

Postgres is launched as part of the pivot repo's docker-compose definition. A launch script creates a database for each service that uses Postgres when the container is first launched. Cassandra is also launched as part of the pivot Docker Compose definition.

Turbopuffer is a managed service and does not have a direct local equivalent; local development for Quest may involve mocking or connecting to our dev Turbopuffer org.

Valkey can be run locally via Docker.

Various LocalStack emulators for other AWS services like S3 are also launched via docker compose.

NATS Environment Variables