Pilot: Websocket Proxy

Overview

For real-time functionality at Pivot, we need to achieve a few things:

  1. Each client device needs to be able to find out when data directly related to the user (e.g., user record from Facebox, notifications from Buzzbuzz) is created or updated. This requires a web service to subscribe to NATS subjects published by various services route them to clients. Dealer is responsible for the subscription, Pilot is responsible for managing the websocket connections with the client devices.

  2. Client devices need to be able to 'cluster' so as to send ephemeral data to each other at high volume, with low latency. This is therefore not a good use-case for NATS, as the traditional architecture of a pub/sub-driven distributed system adds latency and is affected by high message volume. Dealer is responsible for serving as single-point where multiple clients can exchange ephemeral data and Pilot is used to assign client websockets to Dealer instances, and pass websocket messages to/from Dealer via gRPC streams.

  3. Multiple services (Facebox and Blockhead) need to durably store user presence at the user, space, and block level and Messenger needs to store read receipts. This is enabled by Dealer regularly publishing this data to NATS Jetstream. Dealer knows user presence as part of its role in supporting clients to send messages to each other, as clients regularly send presence updates via the Pilot-proxied websocket connection.

Pilot and Dealer Roles

Pilot

  • Receive websocket connection.
  • Authenticate/authorize via Pilot token.
  • Determine whether Dealer assigned to scope.
    • If assigned and in same region, send gRPC request.
    • If assigned in a different region, return that Pilot URL as HTTP error.
    • If not assigned, assign and send gRPC request.

Dealer

  • Accept gRPC request from Pilot.
  • Connect gRPC connections for the same scope in the same in-memory pub/sub goroutine.
  • Subscribe to NATS messages from various services.
  • Publish message to connections if userId or other applicable value matches.
  • Publish presence updates (at regular intervals, debounced) to NATS.

API

Other than a health check endpoint, Pilot has a single HTTP API route: /connect. This route takes a required ?t= parameter, which should be a Pilot token. This route will error if no token is provided, if the token is invalid, if the HTTP method is not GET, or if the Upgrade header is missing.

As discussued in the Security section, this request will also error if the userId has too many active Pilot connections and the connection will end later if the websocket RPM goes above a certain threshold.

If the scope the client is trying to connect to is hosted currently on a Dealer instance in another region, Pilot will respond with a 307 and provide the Pilot hostname the client should try again with.

Once upgraded, Pilot immediately begins sending ping/pong frames per standard WebSocket practices. As long as the client continues to respond, Pilot will keep the connection open for up to 60 minutes. Pilot regularly (every few seconds) updates the Connection data it has created in Valkey for this connection with the latest last_updated_at timestamp. This tells Pilot whether a connection is active or not, without any kind of 'status' column that could easily get out of date and result in orphaned records.

Pilot establishes a gRPC connection to Dealer corresponding to each websocket connection. How Pilot chooses which Dealer instance to connect to is discussed below.

If the client stops responding, Pilot kills the websocket connection and the Dealer gRPC streaming connection. If the Dealer gRPC streaming connection is closed, Pilot kills the websocket connection.

Pilot Tokens

Pilot tokens are just JSON Web Tokens. The pivot repo provides a library for generating these tokens. As long as they are generated using the same secret that Pilot uses, any service can generate them. Pilot uses two environment variables, TOKEN_SIGNING_SECRET_A and _B to enable us to rotate these if needed.

The schema for this JWT (other than the signature, exp, etc.) is:

{
    "userId": string,
    // must start with 'user_', 'room_', 'space_', or 'block_'
    "scope": string,
    // Can only be write for 'block_' and 'space_' scopes, otherwise must be
    // read or will default to read.
    "permission" "READ" | "WRITE"
}

Our Pilot Token generation library allows the generating service to set the expiration, but generally it should be no less than a minute, and no longer than 10 minutes, given that the client can connect to Pilot without being reauthorized during this period.

Pilot does not end websocket connections after the token used to start the connection has expired, however it does subscribe to NATS messages from Facebox to end connections if the userId is updated to be banned from the system in Facebox.

Connections from Pilot to Dealer

The Dealer service is responsible for actually handling the websocket frames the arrive from the client and pushing frames down to each client both from other clients in the same scope and from other Pivot backend services. This is detailed in the Dealer article. From Pilot's side, the job is to find the right Dealer instance to connect to for each websocket, so Pilot can proxy the client's messages to/from Dealer.

Dealer and Pilot are both horizontally scaled fleets of Docker containers. Pilot is fairly stateless. All Pilot instances connect to the same Valkey cluster, and if a Pilot instance dies, its clients simply reconnect over the internet and hit another Pilot instance. Dealer is different. While each Dealer instance has no long term storage or even a database, so is in that sense 'stateless', it is crucial that a given scope exist only on one Dealer instance at a time. If clients A and B are connected to scope block_123 on Dealer instance 1 but client C is connected to the same scope on Dealer instance 2, then they will have a bad experience using Pivot. "Are you in the document? I don't see you."

To avoid this, Pilot orchestrates the assignment of scopes to Dealer instances, which requires Pilot to load balance across the Dealer fleet in a distributed manner. This requires:

  1. Knowledge of all current Dealer instances – they register via ECS Service Discovery with AWS Cloud Map and Pilot consumes this via DNS. Therefore we do have an important dependency on Cloud Map, even though neither Dealer nor Pilot are technically aware of this specific implementation detail, it's handled by ECS on the Dealer side and is just DNS on the Pilot side.
  2. An algorithm for choosing a new Dealer instance for a scope that isn't already assigned/active (consistent hashing ring).
  3. Logic to handle an already assigned scope in another region. (Discussed below)
  4. The ability to read and write scope assignments in a consistent way (using Valkey features) to avoid a scope existing on two Dealer instances.
  5. The ability to read a scope's current assignment, determine whether there are other connections currently and open a gRPC streaming connection to that Dealer instance with the information Dealer needs – userId, scope, and permission – from the Pilot token.

Once Pilot has determined the Dealer instance for a scope and written to Valkey or simply read from Valkey, it opens the gRPC connection and is then, for that connection, simply responsible for frame RPM rate limiting, keeping the Valkey last_updated_at updated for the Connection (crucial for enabling other Pilot instances to know whether the scope assignment is active), and killing the connection in any of a number of cases. All the rest of the logic (what data to send down to the client and how to handle specific schema of websocket frames from the client) is up to Dealer.

So, in summary, while clients for a given scope could be connected to any number of distinct Pilot instances, all those Pilot instances are proxying their websocket connections via gRPC connections to the same Dealer instance for a single scope, at any given point in time. However, if all those connections end, the next time a client tries to connect, the Pilot instance that recieves that request will reassign the scope based on the current fleet of Dealer instances, using consistent hashing of the scope name.

Handling Message Types

When clients send websocket messages, Pilot validates the shape of the message.

Of essential importance is setting the userId to match the userId of the Pilot Token, which should but may not be the id the client has passed. This is only applicable to certain messages types that clients can send up the websocket, like PRESENCE, READ_RECEIPT, TYPING_INDICATOR, and BINARY_BLOCK_UPDATE.

Multi-Region Management

Pilot is designed to be deployed across regions. This enables clients to connect to a nearby region, and then if they are the first client in that scope at that time, they get to choose where that scope lives for the length of that connection. This is great in the common instance where, for example, three clients are connected to the same scope in the same city, but that city is far from AWS US East 2.

This requires the following:

  1. Each Pilot regional deployment lives at two hostnames. One is region scoped and goes straight through the Cloudflare proxy to a regional AWS load balancer and the other is global and goes through the Cloudflare Load Balancer to the closest region that has a Pilot and Dealer deployment. This allows Pilot and Dealer to be deployed on their own, as long as they are in the same region as each other and as long as they have access to the single Valkey cluster that serves the world's Pilot fleet as well as to the NATS cluster.
  2. This enables the client to reach out to a Pilot instance using just ws.pivot.app or us-east-2.ws.pivot.app. The second option is for when the client attempted to connect to a scope and got back a 307 response, with a regional url in the body, meaning that Pilot tried to join the client to this scope, but it was already active on a Dealer instance in another region, so Pilot is using the redirect to instruct the Client to use the regional URL to reach the region the Dealer instance is in, which means that Pilot will be able to successfully connect them.

NATS

Publication

  1. Pilot uses core NATS to publish to ephemeral.pilot.change_feed.* for Connection and ScopeAssignment events. These include:
    • scope_assignment.createdV1
    • scope_assignment.reassignedV1
    • connection.createdV1
    • There is no event for a connection being terminated as Pilot does not have any reliable way to measure this - only the aging of the last_updated_at value.

Consumption

  1. facebox.change_feed.user.updatedV1 - consumed fan-out pub/sub style to ensure that all Pilot instances close websocket connections if a user is banned.

Databases

Pilot uses Valkey (hosted on AWS ElastiCache Serverless) for its data storage, including managing connection state and scope assignments. It relies on Valkey to ensure data consistency and manage distributed state for scope assignments across instances and (theoretically) regions.

  1. Connection: A connection represents a specific websocket connection from a client application, stored in Valkey. The Pilot instance that receives this connection is responsible for the lifecycle of this data.

  2. ScopeAssignment: A scope_assignment represents the relationship between a scope and a Dealer instance, managed within Valkey to ensure global consistency.

Temporal Workflows

N/A

Deployment

Pilot is deployed as a Docker container to ECS and exposed via the Application Load Balancer. It uses Cloud Map DNS lookups to find all of the active Dealer instances in its region, and uses a REGION environment variable to determine its own region.

Security

  • HTTP rate limiting at the firewall level, though this does not rate limit incoming websocket frames.
  • When new websocket connections are requested via HTTP, Pilot checks Valkey for the count of currently active connections, and fails the request if there are too many. This logic uses last_updated_at on the connection data in Valkey, so if some of those connections are actually dead, the client will be able to retry and succeed 10 or so seconds later.
  • Websocket frames-per-minute limit applied to each websocket connection via the Pilot application code. This just happens in-memory for each websocket.