Asimov: AI Content Generation

Overview

Asimov is responsible for generative AI and other ML capabilities, including text generation such as summerization and chapters, image generation, and speech transcription. Asimov generally uses third-party services rather than self-hosted models, but that is an implementation detail.

Asimov does not store generated data for synchronous requests (gRPC), just the logs associated with the generation job. The expectation is that if Asimov has a calling service, the generated data is simply returned to this service.

For asynchronous / implicit requests (that is, when Asimov has generated content based on some other data in another service that Asimov became aware of via consuming NATS messages), Asimov stores the (meta)data it generated corresponding to the id of that other entity. The key use case here is when Blobby’s NATS change_feed alerts Asimov to a new audio or video file, Blobby and Asimov have no direct interaction. Asimov simply chooses sometimes to create metadata for such files.

The decision to centralize AI-generated data within Asimov is predicated on a technical strategy focused on encapsulation and adaptability in the face of the AI field’s swift evolution. This architecture segregates AI data handling and processing complexities into Asimov, enabling isolated, rapid modification and enhancement of AI capabilities independent of other services such as Blobby. This separation is useful for accommodating the unpredictable trajectory of AI advancements. This approach mitigates the risk of introducing AI-related dependencies and complexities into Blobby and analogous services, thus maintaining their operational focus and stability amidst the variable nature of AI progress.

Asimov is depended on by Blobby (implicitly) when it comes to creating metadata for audio and video files as well as by Friend (and potentially Blockhead) for block-related AI capabilities. Whether it is Friend or Blockhead that depend on Asimov depends on whether the result of the AI operations is provided to the end user to potentially persist or provided directly to Blockhead for persistence to a block server-side.

Asimov is responsible for authorizing from a rate-limiting and SubscribableFeature standpoint at the userId and organizationId levels. This includes whether a user/organization has access to an ‘AI feature’ and for rate limiting that access. This creates a gRPC dependency on Wallstreet to determine access and quotas, but the usage data itself lives in Asimov.

Third-Party Services

OpenAI

Room message thread summerization
Text generation for blocks
DALLE image generation for blocks
Search result summerization
Embeddings for Turbopuffer (Quest handles embeddings, potentially using OpenAI or another service; Asimov is not directly involved in Quest’s embedding process)

AssemblyAI

Audio (and video audio) transcription
Transcript summarization
Auto-chapters

API

Asimov allows other backend services to get assets for a given ExternalId but this shouldnot be used by API services, as they don’t know about authorization. For example, Friend should not have a query to retrieve an audio transcript from Asimov for a given file, because Friend does not know if the user has access to that file. Messenger should retrieve the audio transcript from Asimov at the time it returns the room recording to the client.

NATS

Publication

Asimov publishes change feed events each time an entity it owns is created or modified.

Consumption

Asimov consumes Blobby’s audio and video file creation events using Asimov’s own queue-style Jetstream stream.

Databases

Asimov uses Amazon Keyspaces (managed Cassandra) for logging operations / tracking durable rate limits as well as for the actual storage of generated metadata.

Asset - Asset is Asimov’s generic name for a thing that it has generated immutable metadata for without being syncronously asked to do so, such as a transcript for a Blobby audio File.
Operation - A log of a user (or Asimov itself based on NATS consumption) starting (and possibly succeeding) with an operation, such as generating text or an image.