Service Oriented Architecture

Service-Oriented Architecture

All of Pivot's Docker-based services follow specific architectural patterns and are deployed to AWS ECS. This article is not applicable to the frontend applications deployed to Cloudflare Pages, nor to the PivotAdmin full-stack app deployed to Cloudflare Pages.

Context: Why not a monolith?

Generally, new software systems are designed with monolithic backends, where a JSON RPC API is backed by a single database. Domain-driven design can be used even with a monolith to keep the components of this single backend locally separate. Then over time, as the engineering team grows and the system increases in both traffic and complexity, a service-oriented architecture is adopted through slow and painful production migrations that take engineering focus away from feature development.

At Pivot, we've taken another approach. While we don't use the term 'microservices' because of the implication that there are hundreds or thousands of distinct services, (we have fewer than 25 total backend and frontend services), we've jumped straight into service-oriented architure from the beginning, writing the application with separate containerized services for different aspects of the platform, entirely separate from external-facing APIs and entirely separate from the compute and storage used by each other service.

This has downsides such as adding internal network latency and increased CI/CD complexity, but given the complexity inherent in our application, it is easier to have hard separation between domains that is intentional, rather than the 'slippery slope' of having a monolith, but then creating a few other services for batch processing, or websockets, or whatever else. Better to lean-in to service-oriented architecture than accidently build an overcrowded monolith with a few other services that have overlapping database access and pub/sub just for horizontal scaling.

Containers at Pivot

Pivot is designed to be cloud provider agnostic. This means that we deploy container images designed to run anywhere rather than writing specifically for a specific deployment platform. We run those containers in an AWS ECS cluster, which means much of our IaC is specific to ECS, even though the container iamges are entirely deployment agnostic.

To be clear, the application code for each service does not know what ECS is. AWS tooling is used as part of CD, networking, and observability, and scaling, but from the perspective of each service, none of this is visible. For example:

  • Services emit traces to an OpenTelemetry collector that runs as a ECS daemon and pushes to to Axiom.
  • We use Fluent Bit to collect and ship logs from stdout.
  • Services make gRPC requests to other services using environment variables for the names of those other services. Those environment variables are set to the ECS Service Connect URL of the other service, which is resolved by ECS to load balance across the tasks of that service. The ECS-managed Envoy service mesh proxy intercepts all of this outgoing and incoming traffic, but that is not visible to each service.
  • Docker images get built in GitHub Actions and pushed to AWS ECR, then ECS pulls those images to start tasks, but the service itself has no knowledge of these specific mechanisms. It's just Docker
  • Services make heavy use of environment variables, but have no knowledge of how those variables get injected. Services don't know about the Terraform configuration files that define ECS task definitions. ConfigMap and secrets get merged into a single set of environment variables injected into the task by ECS. This allows us to inject environment variables in an entirely difference way in the local development environment.

Components of a Service

/apps/* directory in the pivot repo

  • source code, using the /src convention for TypeScript or package-based subdir conventions for Go.

    • HTTP health check endpoint defined at /health
    • Distributed tracing instrumentation with OpenTelemetry SDK
    • Structured logging to stdout
    • Graceful SIGINT/SIGTERM handling
    • Connect or HTTP external transport and/or gRPC internal transport
    • Reflection (opens in a new tab) for gRPC APIs, at least when STAGE=local
    • Seed logic in the seeder app
    • Validation, authorization, and NATS publication for mutation operations
    • Environment variable validation (preventing invalid config states from starting)
    • When using Postgres, handling both the writer endpoint and reader endpoint for Aurora PostgreSQL clusters to optimize read workloads across the global database infrastructure
  • Dockerfile, Nx project.json, and any other build configuration files

  • Database seed logic, via the seeder app, if applicable, for local development and testing purposes

/proto/* directory in pivot repo with Buf config

  • Use folder paths to version the service's API. I.e. /proto/wallstreet/grpc/v1/* and ./nats/v1/*.

Infrastructure as Code in pivot-internal repo

  • libs/terraform/services/* directory for that defines the ECS deployment and service as well as files for any other infra components specific to the service.

  • Terraform Cloudflare pivot.app DNS subdomain (if applicable) and associated configuration

  • Flipt configuration flags (if needed)

CI/CD workflows

  • Simple GitHub Actions workflow in pivot repo to trigger the generic build and push workflow (which will trigger Terraform to redeploy new image versions over in pivot-internal).

Secrets

Secrets for each environment are either configured in the AWS Parameter Store Terraform config or added manually to Parameter Store via the AWS console for each environment.

Observability

  • Application code instrumentation for logging and tracing with an Otel SDK (and optionally emitting metrics in the same way).
  • Axiom dashboard(s) and alerts
  • Pager Duty configuration to handle new Axiom alerts

Local development

  • Add service to docker-compose.yaml in the pivot repo, along with needed environment variables.
  • Add service dependencies to the local development scripts, such as any AWS resources that need to be emulated or a Postgres database that needs to be created when the local Postgres instance is created.

Service Design Best Practices

Authentication

API services are responsible for authentication based on JWT, API key, or some other mechanism. Internal services do not authenticate and should not parse JWTs.

Of course internal services use secure service-to-service communication mechanisms provided by ECS Service Connect when communicating over gRPC and use NATS authentication, but they do not authenticate the actual original external request sender, whether that be a user making an RPC request via Friend or a third-party service POSTing a webhook.

Authorization

As described above, API services are responsible for handling API-level authentication, as they are the entry point for incoming requests and need to verify the identity of the user. API services are also responsible for operation-level authorization, determining whether a client has the basic rights to even attempt to execute a particular operation. For example, certain Friend RPCs require an authenticated user context and others do not.

More granular, domain-specific should be delegated to the individual internal services. This way, the logic relating to specific application rules is encapsulated within the service that "owns" that domain, leading to a more maintainable, scalable, and secure architecture.

Put differently, API services do implement basic authorization, but not row level security. Services that provide CRUD APIs need to consider who can call each exposed operation, but do not need to consider the nuances of each data service's CRUD permissions for the entities that service is responsible for.

For example, if a user uses a Friend service RPC query to get a list of all spaces, it is up to the Blockhead service to convert a provided userId into a list of spaces. Each service can decide the schema for gRPC request messages, so in this example, the Blockhead service could decide that in addition to providing a userId, if the calling service wants all spaces for a given organization, that needs to be queried separately, because the Blockhead service has decided it won't provide spaces based on membership, public access, and organization admin access using the same query, instead exposing gRPC methods that each expect a different filtering value.

As another example, if the Rest service gets a request to create a new Pivot chat message in a room, it would send a request to Messenger without determining if the userId has access to the room. Messenger would then authorize based on trusting the userId provided by Rest, but without assuming the userId has access.

As a general principle, if a gRPC method request message includes a principalUserId, the client service can expect that the server service will consider whether that user as an actor is allowed to take that action. For mutations, this would result in returning an error if the user is not allowed and for queries it could either result in an error or an empty array, depending on the context.

Validation

Services that own/store entity data must validate mutations to that data before storing or publishing.

It is up to API services whether or not to provide their own validation of end-user input before passing it to some other service that stores the data. Generally, API services should provide some degree of data validation to avoid unnecessary gRPC calls and unnecessary NATS messages that should have been 'caught' by the API service (Friend, Rest, etc.) as nonconforming. To avoid complexity however, API services should not attempt to reimplement the nuance of each internal service's validation logic, just the basics of their own APIs required/optional fields and data types.

It's also important that API services not create validations that conflict with owning service rules. For example, if the Friend service implements a 30 character limit on string length for some mutation argument, but the Messenger service implements a 100 character limit and the Rest service doesn't implement one at all, then a Room originally created through the Rest service could throw errors when updated through the Friend service. Therefore, max string length is an example of a nuanced validation rule that should not be implemented by API services and simply delegated to the owning internal service for the specific entity.

Data Storage

All services that need a database should use Postgres, as described here.

Because a single Postgres database user can be scoped to a database, we can maintain separation between services simply with our environment variables so that services cannot access each other's databases.

Each service implements its own object storage (S3 bucket), if applicable, and cannot use the S3 API for other service's buckets. However, of course one service could consume a pre-signed GET or POST URL for an object provided by another service, like any client could.

Service-to-Service Communication

Because the same service can implement both gRPC and NATS based communication patterns and serve multiple requests from each at the same time, using goroutines is essential (async await in Node) to avoid blocking between distinct API transports.

gRPC and Protobuf

gRPC is a cross-language, high performance, type-safe client-to-server protocol used for all synchronous internal communicaton between services, wherever possible. We use the connect-go and connect-es libraries (Go and JS/TS) for gRPC, not the Google-maintained gRPC library. Connect is a flexible yet simple RPC framework that allows us to be gRPC-compatible without a lot of complexity.

In ECS, we use Service Connect to provide the service mesh layer, which means ECS is running a proxy container for each of our backend service tasks. Service Connect is HTTP and gRPC-aware and provides its own metrics.

gRPC is not only used because of its high performance over the wire, but also because of the built-in schema story via Protocol Buffers. We also use Protobuf outside of gRPC for our other schema needs, including for serializing NATS message content and the Friend API, where Protobuf enables type safety between client and server without GraphQL or server-side TypeScript, using Connect's own "Connect" protocol.

Protobuf workflow

We define .proto files organized by the service that is the 'server' or publisher. These are stored outside of the source code of that service. We can then consume those files in all client services and codegen language-specific stub code for writing gRPC calls and do the same in the service itself for writing gRPC server code. We also use .proto files to represent the schema for serializing and deserializing NATS messages.

When setting up for local development, you will install all the Go CLI tools necessary for working with Protobuf. Assuming your GOBIN is set in your shell configuration, you will then be able to call the buf CLI.

Note that using the Go binary vs TypeScript module for the Buf CLI are interchangable, when it comes to linting (buf lint vs npx buf lint), only the TypeScript module seems to successfully resolve the es protoc plugin, so npx buf generate is preferred for generation.

If you modified .proto files, always run buf generate before opening a PR to ensure it succeeds, as well as buf lint and buf format --write. CI checks on PRs check that Protobuf files pass formatting and linting checks.

We also use the Buf CLI to check for breaking changes in every PR. This is a key concept with Protobuf; if you have to make a breaking change, do it by establishing a v2 Protobuf package, but if you can make your change without it breaking compatibility with existing clients, that is preferred to adding a new version.

We use the Buf Schema Registry to document our Protobuf schema (using markdown alongside the .proto files that Buf integrates with its auto-generated documentation) as well as to allow Protobuf files in the Pivot repo to be consumed by applications in other repos (for example, by PivotAdmin). We push changed to the BSR on commit to the main branch of the pivot repo.

NATS (Queuing and Pub/Sub)

We use NATS (with JetStream) for queue/event bus/pubsub patterns. NATS is an essential part of how our services communicate and has its own article. The same .proto directory in the pivot repo holds schemas for NATS messaging.

Eventual Consistency and the Dual Write Problem

There is a concept in distributed system architecture called the 'dual write problem' This refers to an issue where a service successfully commits a change to its database, but fails to emit the event to the pub/sub system (in our case, NATS). This leads to inconsistency. The traditional solution to this is to use database Change Data Capture (such as Debezium) between the database and the pub/sub system, rather than manually emitting events in application code.

In the Pivot codebase, we do not use change data capture for events; services publish to NATS after committing changes to the database and we risk these publish operations failing. Why do we do this?

  1. Our services are supposed to be decoupled. The NATS stream is not intended to be a strongly consistent / transactionally guaranteed view of the system. We aren't using event sourcing. So, if two services depend on reading each other's database change events in a strongly consistent way, that is not a good NATS use case. Use synchronous gRPC.

  2. Change Data Capture is hard and fails. Rather than maintaining servers to run it and debugging delays/failures that pile up in the Write Ahead Log of the database, we prefer to handle NATS publication failures architecturally by not using them to build consistent views of another service's database data. If you need a consistent view of database data, make a request to the owning service via gRPC.

Environment Variables and Secrets

All services should implement the env or config object pattern, where env vars are key value pairs in the object/struct construct for the language. All services should validate this object at runtime, such as using t3-env (Zod) in Typescript services or viper in Go.

Deployment

We deploy services onto ECS via GitHub Actions and Terraform. The basic automated CD pattern is:

  1. Trigger the generic build and push workflow for each service when it is affected by a commit (service specific trigger-deploy.yml in pivot)

  2. Build the image (generic build-and-push.yml in pivot)

  3. Push to ECR (generic build-and-push.yml in pivot)

  4. Update the service.tf file with the new Git SHA as the image tag (generic update-tf-and-commit.yml in pivot-internal)

  5. Let Terraform Cloud plan and apply to staging, triggered by their GitHub integration (no workflow required)

  6. Deploy to the multi-tenant production environement and each enterprise single-tenant environment via a GitHub Action that is triggered by the Terraform Cloud webhook when the staging environment apply succeeds (generic trigger-production-deploys.yml in pivot-internal)

Terraform

Terraform handles our infrastructure deployments, including redeploying ECS services with new Docker image tags.

Networking

We use ECS Service Connect (which uses Envoy proxies) as a service mesh, so while each service is unaware of this, each service can reach each other service (that it has permission to reach) in a load-balanced, observable way via a simple standard ECS Service Connect service-name based hostnames injected as environment variables. It's important to use environment variables for the hostnames/ports of other services, because in local development, we use Docker Compose, which makes service-to-service networking easy to reason about without understanding any AWS emulation.

In general, all services run in private IPv6 (dual-stack) subnets, which allows them to reach the internet (outgoing only) with their IPv6 address.

Summary: Considerations for Best-Practice Service Architecture

Service Discovery, Load Balancing, Retries, and Failure Handling

ECS Service Connect handles these elements. Our service architecture does not address them.

Service to Service Authentication

Our service architecture does not include this. We use the AWS networking layer to restrict ingress and egress only only the necessary sources and destinations.

Request Routing and Middleware

Services act as HTTP and/or gRPC clients

Metrics, Logging, Tracing

Services should implement structured logging to the console/standard output. Services should use an OpenTelemtry SDK to configure request tracing for both gRPC and NATS. For NATS, this means starting a new trace when processing each message, whereas for gRPC existing traces should be honored and a new span started.

OpenTelemetry collection is handled by a sidecar container.

Error Handling

gRPC request processing errors should be logged and responded to with the appropriate gRPC response code, via the Connect framework.

NATS processing errors should result in logged errors and a 'NAK' response to the NATS server, identifying that the message was not successfully processed.

Transports and Serialization

The Connect framework should be used for gRPC transport, and therefore Protobuf for serialization is performed automatically.

For NATS, the protobuf library can be used for serializing and deserializing to binary. (@bufbuild/protobuf for TS and google.golang.org/protobuf for Go)

Support for Pivot Cloud Platform Private Deployments

Private deployments are separate from the production multi-tenant Pivot Cloud Platform, in which all services are deployed for a single Pivot customer in a separate VPC in a separate AWS account.

Private deployments use the same Docker images as the multi-tenant service, so CI/CD stays the same, on the development side, but complexity is added when it comes to the GitHub Actions service and environment deployment workflows as well as the Terraform Cloud workspaces, which have to consider additional deployment targets for each service. A single GitHub Actions workflow deploys services to all private/single-tenant accounts, once staging and multi-tenant production have been successfully deployed to.