Quest: Turbopuffer Query and Indexing Service

Named after the idea of 'finding' something or 'searching' for something on a quest, this service wraps Turbopuffer, our vector search database. Turbopuffer is a managed service known for being serverless and scalable, aligning with our infrastructure principles. We isolate Turbopuffer-specific logic within its own library to allow for potential future replacement with an alternative search service via dependency injection if needed.

When clients need to run search operations, they use the a query provided by Friend. Friend simply forwards it on to Quest, which is responsible for interacting with Turbopuffer.

The complexity here (the reason this is its own service) is that the data in Turbopuffer needs heavy filtering at request time. Yes, users can run searches on blocks, for example, but only those blocks they have access to.

Quest is responsible for indexing data into Turbopuffer. Instead of multiple services making HTTP requests to the Turbopuffer API, they simply publish to NATS on create/update/delete operations. These message are then processed by Quest and result in API requests from Quest to Turbopuffer.

This allows us to abstract away the underlying Turbopuffer implementation and also add more specific queuing as well as better retry and error handling. Quest retries failed indexing API calls by leveraging the NATS durable consumer pattern, leading to an easy way to measure and handle failed document index operations (the consumer will back up with un-ack'd messages).

Quest also allows us to centralize schema validation and is responsible for actually creating namespaces in Turbopuffer.

API

Quest provides a single gRPC method to ExecuteQuery which takes one or more index names, filters, sorts, and a query value. Optionally, this method takes a userId. If it does, the data returned will also be scoped to the userId's authorization relative to the sematics of each queried index's properties. If no userId is provide, all matching results are returned, so it is crucial that Friend always provide a userId.

Behind the scenes, Quest has logic for each index as two which service provides the specific data needed for authorization filters to be applied. For example, when an executeQuery request comes in that includes the messenger_messages index, Quest makes a gRPC request to Messenger to get the list of roomIds that user has a current membership for as well as a request to Blockhead for the room blocks (with the associated roomId) the user has at least read access to (Messenger does not have authorization information for space rooms). Quest combines the roomId values returned from both systems to form a full filter value. The query Quest executes against Turbopuffer will be filtered in whatever ways are defined in the ExecuteQuery request but will also be filtered by roomId.

Pivot's data authorization rules are fairly complex. It is theorically possible for Quest to have to split a Turbopuffer API request into multiple queries to handle an array of possible values that is too large. Consider a user that (for whatever reason) has access to 10,000 rooms. It's possible that Turbopuffer can't handle a filter value of that size. If Quest has to do this, the response will be slower, but the calling service doesn't need to consider this.

NATS

Publication

N/A

Consumption

Quest consumes all servicename.change_feed.entity.* subjects that should affect the search indexes.

Messages in these subjects must provide the information that should be indexed (e.g., a room message from Messenger should have the room name) and also enough information that Quest will be able to scope searches (e.g., a space message from Blockhead must have an organizationId).

Quest consumes these messages via NATS durable consumer on the pivot_main stream. This means that unless fundamental GB stored limits are reached, NATS will store messages that Quest errored on indefinitely and this will be observable via the consumer backlog.

Databases

Quest writes data to Turbopuffer via its API/SDK.

Quest caches user permissions in Valkey (hosted on AWS ElastiCache Serverless). When it reaches out to Blockhead (for example) for a specific userId, it stores the returned userId and permission with a TTL of 5 minutes, along with the calculated permissions. This allows requests to Quest for the same userId and the same entities to resolve without reaching out to another service for authorization information and also therefore means that Quest is up to 5 minutes out of date on user permissions.

Temporal Workflows

N/A

Deployment

Quest is a stateless service that is deployed via ECS Fargate. It requires access to Turbopuffer, Valkey, and an access token for whatever embedding model is used.

Observability

Security

Quest is only accessible inside the VPC.

If a user_principal_id is provided to a Quest RPC, then the data returned is scoped to the access permissions Quest gets from other services for that user.