Database and Data Store Guidelines
Each service has its own database, if needed. Services do not access each
other's database tables. Generally, services do not access each other's object
or file stores either, though this is a grey area in the sense that it is a
common S3 pattern to provide pre-signed POST URLs for direct upload from
clients.
Data Storage by Service
Some Pivot backend services are entirely stateless. The below services durably store data in external stores:
- Pilot: Valkey (ElastiCache Serverless)
- Rest: Amazon Keyspaces (Managed Cassandra)
- Buzzbuzz: Amazon Keyspaces (Managed Cassandra)
- Asimov: Amazon Keyspaces (Managed Cassandra)
- Blobby: Amazon Keyspaces (Managed Cassandra), Mux, S3 (user content)
- Stagehand: S3 (raw recordings)
- Blockhead: Postgres (ORM: Ent), Amazon Keyspaces (Managed Cassandra) for BinaryBlockUpdates
- Facebox: Postgres (ORM: Ent)
- Messenger: Postgres (ORM: Ent)
- Quest: Turbopuffer (Managed Vector DB)
- Wallstreet: Postgres (ORM: Ent)
AWS Aurora PostgreSQL
We use AWS Aurora PostgreSQL with serverless configuration and global database features. Our databases are organized into three distinct clusters:
- Facebox cluster:
- pivot_facebox
- Blockhead cluster:
- pivot_blockhead
- Messenger cluster:
- pivot_messenger
For development and testing:
- Use a Postgres Docker image for local development
- Use a Postgres Docker image for integration testing in CI
- Run our E2E tests in our staging environment against Aurora PostgreSQL
Therefore, application migrations have to succeed against both local Postgres and Aurora PostgreSQL to make it to production.
ORM Best Practices
Services should address the following best practices when designing their database interaction:
- Use Drizzle for Typescript and Ent for Go. Both balance developer experience and ease of use with solid runtime performance and idiomatic schema management.
- Services are responsible for their own database migrations. Our best practice is to run migrations at service start-up. This delays start-up a bit, but is worthwhile to avoid leaking production database secrets to migration specific enviornments and avoid having a new application version running when the associated database migration has failed. Services should fail start-up heath-checks unless/until migrations have scceeded. Keep in mind that the migration library used needs to apply appropriate locks and transaction boundaries to handle concurrency during service deployment and migration application.
- In CI on PRs (GitHub Actions), check to ensure that migrations do not conflict
(if using Go, Ent provides the
atlas.sumfile for this) and verify that codegen is up to date (e.g., in Ent,go generateshould not diff if the PR author ran it before committing). - Add database seed logic to the
seederapp in thepivotrepo. This app exists to 1) seed local dev environments and 2) serve as a base layer of tests in CI. By running seed script logic via the service's API, we optimize the degree to which seed data represents the actual write behavior and validation layer of the service. - Utitlize two environment variables,
POSTGRES_WRITER_CONNandPOSTGRES_READER_CONN. Expose this to your domain layer so that at a per query level, you can decide which endpoint to use, using the reader endpoint as much as possible.
Primary Key Selection and Indexing
With Aurora PostgreSQL, we follow standard PostgreSQL best practices for indexing and primary key selection. The use of composite primary keys and secondary indexes remains important for query performance optimization.
Our Go ORM Ent does not allow composite primary keys for 'nodes' (only edges),
so we change the generated migration file for a new table such that the id
column definition is exactly as Ent generates it to be, but the primary key is
set to a composite ending with that id, but starting with some other column.
As long as we configure Ent to require a value for each field and set all fields
used in the composite primary key as immutable(), this works fine. (Our
TypeScript ORM Drizzle allows composite primary keys on all tables, so there
are no .sql manual edits needed.)
It is important that we consider secondary indexes when using a composite
primary key. Aurora PostgreSQL automatically indexes the primary key (as a
composite) however our point reads will use the id column alone, so we need to
manually define an index on that column. Additionally, it is useful to manually
define an index on any foreign key that we expect to often query based on. For
example, the messages table in Messenger has a primary key based on the
room_id and id columns, so we add indexes on both of those columns to enable
efficient point reads and WHERE room_id = "123".
Turbopuffer
Turbopuffer is a managed, serverless vector database optimized for performance and scalability.
We use Turbopuffer for vector search capabilities within the Quest service. Quest handles indexing data into Turbopuffer and executing queries against it.
Amazon Keyspaces (Managed Cassandra)
Amazon Keyspaces is a scalable, highly available, and managed Apache Cassandra-compatible database service. We use Keyspaces for several services requiring NoSQL data storage, particularly where high write throughput and scalability are important.
Services using Keyspaces:
- Asimov
- Buzzbuzz
- Rest
- Blobby (metadata)
- Stagehand
- Blockhead (BinaryBlockUpdate table only)
AWS ElastiCache Serverless (Valkey)
We use AWS ElastiCache Serverless, running the Valkey engine (a fork of Redis), for in-memory data storage. It provides high performance caching and serves as the primary data store for the Pilot service.
Data Validation Practices
Create a validation system (zod if using TypeScript, validator for Go) that
lets you refine acceptable field values beyond what the database itself can
enforce and beyond what might be specified in applicable Protocol Buffer
.proto API schema files. Run this validation before every insert and update,
unless you are sure the specific insert/update is going to be within the
parameters of the validation system.
Database Migration Principles
Whatever ORM/schema builder is used for creating database migration files for each service is likely very flexible, however we have some rules on database migrations that developers need to enforce when creating migrations and reviewing pull requests that include a new migration.
Migrations can contain:
- Column added
- Column dropped (we must verify no data loss and that no running applications are using this column)
- Table added
- Table dropped (we must verify no data loss and that no running applications are using this table)
- Constraint on column dropped
Migrations cannot contain:
- Renaming a table
- Rename column
- Change column data type
- Constraint on column added
Additional rules:
- Only one migration should be a part of a single PR (one file/entity changed/created). Reject a PR if it has multiple migrations, as this makes it difficult to assess the exact changes made in the PR.
- Generated migration files should not be changed. Reject a PR if it does so.
Keep in mind, we must never attempt to synchronize the timing of database migration deploys and service version deploys. These will inherently be offset from each other, even though we run migrations at service start-up. Always assume old image versions will run with new dataase schema and new image versions will run with old schema.
Local Development
Postgres is launched as part of the pivot repo's docker-compose definition.
A launch script creates a database for each service that uses Postgres when the
container is first launched. Cassandra is also launched as part of the pivot
Docker Compose definition.
Turbopuffer is a managed service and does not have a direct local equivalent; local development for Quest may involve mocking or connecting to our dev Turbopuffer org.
Valkey can be run locally via Docker.
Various LocalStack emulators for other AWS services like S3 are also launched
via docker compose.