Live Audio and Video

Implementing Live Audio and Video Features

Live audio and video calls and recordings are implemented through a mix of LiveKit Cloud's backend, LiveKit's frontend SDKs (React and React Native) for WebRTC, Mux's backend, RTMP streaming, and Pivot backend services.

Calls

Audio Rooms/Video Rooms/Calls in Direct Rooms

  1. Client uses the Friend GenerateRoomAvToken query to get a token for the roomId and the Websocket URL to pass to the LiveKit SDK.
    1. Friend uses Messenger to fulfill this.
    2. Messenger checks the user's permissions and uses Stagehand to fulfill this
    3. Stagehand generates the access token.
  2. Subsequent operations are via the LiveKit SDK on the frontend, except for starting/stopping recording.

Streaming Rooms

Streamer

  1. Client uses the Friend GenerateRoomAvToken query to get a token for the roomId and the Websocket URL to pass to the LiveKit SDK.
    1. Friend uses Messenger to fulfill this.
    2. Messenger checks the user's permissions and uses Stagehand to fulfill this.
    3. Stagehand generates the access token.
  2. Subsequent operations are via the LiveKit SDK on the frontend, except 'going live'. The client who wants to go live does not have the stream key from the Room from Messenger, so Messenger (via Friend) exposes StartLiveStream and StopLiveStream, which wraps Stagehand.

Viewer

The RoomRecording entity gets created for a streaming room at the time the asset is reported as completed by Mux, so that is not a way to watch the live stream. Instead, each streaming room has a Mux Live Stream playbackId associated with it durably, and that is created by Blobby at the time the room is created and stored in Messenger on the Room. Therefore, retrieving the Room via Friend allows the client to retrieve the stream playback ID as well as a Mux JWT generated for the client by Blobby as an HLS URL.

Block Audio Rooms

  1. Client uses the Friend GenerateBlockAvToken query to get a token for the blockId.
    1. Friend uses Blockhead to fulfill this.
    2. Blockhead checks the user's permissions and uses Stagehand to fulfill this. Because AvRooms are not created by Blockhead/Stagehand at the time each block is created, Blockhead also asks Stagehand to create or update the AvRoom if Stagehand fails to create the token.
    3. Stagehand generates the access token.
  2. Subsequent operations are via the LiveKit SDK on the frontend. Recording is disabled (Blockhead provides not such gRPC method), because block ephemeral audio experiences are not recordable/those recordings aren't processed anywhere and would be lost.

Room Recordings

Room recordings are entirely handled server side. The Stagehand API is used to initilized a recording. When LiveKit has finished processing the recording, it sends it to Pivot's raw_recordings S3 bucket, and a webhook from S3 keeps things moving from there, through Tunnel, to Stagehand, to Messenger, to Blobby, to Mux.

Audio and Video Attachments

Video Message

Video messages are recorded in one of two ways:

  1. If the client supports RTMP streaming upload (React Native), it streams the video as it is recorded to Mux. (It is not a bad idea to save locally while recording and fall back to file upload to Mux if stream fails.)
  2. If the client does not support RTMP streaming upload (browsers and therefore also our desktop app), the client simply uploads the file to Mux with a directUploadUrl once the recording is completed.
    • In the future, we should ship a native RTMP client with our Tauri app, so that even though the Tauri web view doesn't support RTMP, we can do so on the local system, and provide streaming, OR alternatively, to support both Tauri and actual browsers running our Next App, we could ship a web service that proxied a stream that the browser does support into an RTMP stream to Mux, or use LiveKit to achieve the same.

Message

  1. If RTMP is supported, use the Friend CreateFile mutation to create a File that corresponds to a Mux Live Stream.
  2. If RTMP is not supported, record a video locally, and upload it using CreateFile.
  3. Either way, attach the successfully uploaded-to or streamed-to File that represents a Mux asset to the message with CreateMessageAttachment.

Block

  1. If RTMP is supported, use the Friend CreateFile mutation to create a File that corresponds to a Mux Live Stream.
  2. If RTMP is not supported, record a video locally, and upload it using CreateFile.
  3. Either way, attach the successfully uploaded-to or streamed-to File that represents a Mux asset to the block with CreateBlockAttachment.

Block Response

  1. If RTMP is supported, use the Friend CreateFile mutation to create a File that corresponds to a Mux Live Stream.
  2. If RTMP is not supported, record a video locally, and upload it using CreateFile.
  3. Either way, attach the successfully uploaded-to or streamed-to File that represents a Mux asset to the BlockResponse with CreateBlockResponseAttachment.

Audio Message

Audio Messages are not streamed for upload, they are just recorded locally and uploaded directly to Mux. Once the client has recorded the (small file size) audio file locally, use the appropriate mutation RPC to generate a File that includes a Mux directUploadUrl, then POST the file to that URL. Upon successful upload, use the appropriate mutation to formalize the file attachment.

Message

  1. Record an audio file locally, and upload it using CreateFile.
  2. Attach the successfully uploaded-to File that represents a Mux asset to the Message with CreateMessageAttachment.

Block

  1. Record an audio file locally, and upload it using CreateFile.
  2. Attach the successfully uploaded-to File that represents a Mux asset to the Block with CreateBlockAttachment.

Block Response

  1. Record an audio file locally, and upload it using CreateFile.
  2. Attach the successfully uploaded-to File that represents a Mux asset to the BlockResponse with CreateBlockResponseAttachment.

Presence

Dealer provides a specific AV_PRESENCE event to connected clients (via Pilot websocket) to support frontend use cases for rendering how many users/which users are currently in a call for a given room or block. Messenger also provides a field on the Room entity that it pulls from Stagehand for initial client hydration with presence information, which the client can overwrite with Dealer/Pilot events.