Stagehand: LiveKit API Client

Overview

Multiple services, Blockhead, Messenger, and (in a sense) Blobby, need to trigger actions and query information from the LiveKit audio/video API. We want a single service to interface with LiveKit and Stagehand is that service. The name 'Stagehand' comes from the fact that LiveKit is used for live audio/video, and Stagehand assists with that.

The service boundaries are as follows:

  • Stagehand: Knows about LiveKit 'rooms', but not Pivot rooms or Blobby files
  • Blobby: Knows about Files, but not LiveKit raw recordings. Requires some other service to prompt Blobby to ingest a recording.
  • Messenger: Knows Pivot rooms and messages, but not LiveKit rooms or blocks.
  • Blockhead: Some blocks allow integrated audio calls, but Blockhead has no direct connection to LiveKit, so Blockhead uses Stagehand to create AvRooms that correspond to Blocks.

Stagehand creates LiveKit rooms and LiveKit access tokens upon request, which allows any service (Messanger and Blockhead) to create a LiveKit room and provide access tokens. When a service requests creation of a new LiveKit room via Stagehand, they need to provide an enum value following Stagehand's gRPC request schema, which can be pivot_room, or block. Stagehand enforces the types of LiveKit rooms we support, though it doesn't have any special understanding of them.

If a user wants to join an existing audio or video room, they request a token from Friend, which resolves to Messenger (or Blockhead) for authorization, which makes a request to Stagehand to actually create the meeting token. Stagehand validates that the AvRoom exists against its own database and will otherwise fail requests to create meeting tokens.

Recording Processing

A recording of a Pivot room audio or video call moves through the following lifecycle:

  1. A frontend client triggers a recording using the Messenger gRPC method, or a recording is automatically triggered by a client joining a LiveKit room based on the LiveKit room's auto-recording configuration.
  2. Once the recording is ended by the client or all participants leave the LiveKit room, the recording is uploaded by LiveKit to the raw_recordings S3 bucket. This prompts SNS to push to an SQS queue dedicated to these S3 notification.
  3. Stagehand consumes from SQS and now knows that LiveKit has added a recording to S3. Stagehand uses the name of the recording object in S3 to determine how the LiveKit recording file from S3 maps to a type of Pivot entity that the LiveKit room corresponds to and what the ID of that record is).
  4. Stagehand publishes the information about the new recording as a NATS message, using the prefix identified above as part of the subject. This is sufficient for Messenger (or any other interested service) to identify that a new recording has been created corresponding to a room they own. Messenger can then can create a file using Blobby's gRPC API and save the returned fileId as a new RoomRecording record. Stagehand's NATS message includes the recording timestamp, roomId, and crucially, the S3 object URL. This S3 URL is pre-signed for a week, which is enough time for it to be passed into Blobby, from Blobby to Mux, and even for some operation to fail and be retried even days later. This does not require any service other than Stagehand to have read or write access to the raw_recordings S3 bucket as a whole.

The summary of this recording flow is: LiveKit -> S3 -> SNS/SQS -> Stagehand -> Messenger -> Blobby. Stagehand can't reach out directly to Blobby, because it actually does not know the context in which a file is being created. It is up to another service to consume Stagehand's NATS messages, use Blobby to create a File, and create an Attachment record of some kind (such as RoomRecording or a BlockAttachment).

API

Stagehand exposes a gRPC API. It has no authorization for any operation – calling services need to consider whether the userId should be able to perform each action first.

UpsertAvRoom

Create a new LiveKit room or update an existing one. The calling service must provide an AvRoomId and type (pivot_room, block).

Stagehand does not validate that the id provided actually corresponds to the type provided, but will fail if the combination of the two already exists in the Stagehand database.

CreateAvRoomAccessToken

Access Tokens are just JWTs that scope a user's access to a given LiveKit room and the associated permissions. This will fail if the AvRoom does not exist in LiveKit, as we do not give clients permission to create rooms. Stagehand has a database, so it doesn't actually check LiveKit to see if the AvRoom exists.

GetAvRoomPresence

This RPC pulls the presence JSON from the Stagehand database, which Stagehand updates based on webhooks received from LiveKit (via Tunnel), based on a provided AvRoomId.

StartAvRoomRecording and StopAvRoomRecording

Start and stop a LiveKit egress to S3. This will succeed only for type of AvRoom pivot_room because block AvRooms are not intended to be recorded. Stop may fail if a recording wasn't in progress.

The fact that the file gets written to S3 is an implementation detail controlled by Stagehand.

StartAvRoomLiveStream and StopAvRoomLiveStream

Start and stop a LiveKit egress via RTMP. If the AvRoom is a pivot_room this will assume that it is a streaming room. If it is not, Start will fail because Messenger won't have the stream key that Stagehand requires to start an RTMP stream. Stop may fail if a stream egress wasn't in progress.

NATS

Publication

  1. Stagehand publishes to stagehand.change_feed.av_room .created and .updated whenever the corresponding operations are completed.

  2. Stagehand publishes to stagehand.change_feed.raw_recording.{type}.created whenever it finishes processing a new S3 file successfully. The type is pivot_room or block.

  3. Stagehand updates its database with user presence information and pushes it to ephemeral.stagehand.presenceV1. Each NATS message represents the presence in a single AvRoom at the point in time it was published, give or take 5 seconds or so.

Consumption

Stagehand consumes Tunnel's subsets of Tunnel's incoming-webhook-event subjects to support multiple features:

  1. S3 notifications tunnel.incoming_webhook_events.object_stores.raw_recordingsV1 to kickoff recording processing.
  2. LiveKit participant_joined and participant_left notifications from tunnel.incoming_webhook_events.livekitV1 to maintain local presence state.

Databases

Stagehand uses Amazon Keyspaces (managed Cassandra) to store data about AvRooms.

  1. AvRoom – This table stores the AvRoom name as the primary key, which is constructed based on the Pivot related record type (pivot_room or block) and the ID of that record (a UUID presumably). We also store created_at but not more than that because LiveKit maintains metadata about each LiveKit room and the LiveKit room IDs are == to the Stagehand IDs. We also store a presence JSON column which is an array of ParticipantInfo objects from LiveKit.

Temporal Workflows

N/A

Deployment

Observability

Security

  • Stagehand is the only service with a LiveKit API key.