Self-hosted NATS (JetStream) runbook
This runbook covers how to connect to the self-hosted NATS cluster with the NATS CLI from your local machine.
Prerequisites
-
Cloudflare WARP is installed and connected to the correct environment network. See Cloudflare Tunnels.
-
AWS CLI configured with access to the target account.
-
NATS CLI installed:
brew install nats-io/nats-tools/nats
Connect with admin credentials
-
Export the cluster name and region (SERVICE_NAME can be any configured service; it is only used to read the shared NATS host list):
export AWS_REGION="us-east-2" export CLUSTER_NAME="nats-prod" export SERVICE_NAME="facebox" export SSM_PREFIX="/nats-cluster/${CLUSTER_NAME}" -
Fetch the connection string and admin creds from SSM:
NATS_HOST=$(aws ssm get-parameter \ --region "$AWS_REGION" \ --name "$SSM_PREFIX/services/$SERVICE_NAME/nats_host" \ --query 'Parameter.Value' \ --output text) ADMIN_JWT=$(aws ssm get-parameter \ --region "$AWS_REGION" \ --name "$SSM_PREFIX/admin/nats_admin_jwt" \ --with-decryption \ --query 'Parameter.Value' \ --output text) ADMIN_NKEY=$(aws ssm get-parameter \ --region "$AWS_REGION" \ --name "$SSM_PREFIX/admin/nats_admin_nkey_seed" \ --with-decryption \ --query 'Parameter.Value' \ --output text) -
Create a local creds file:
cat > ./nats-admin.creds <<EOF -----BEGIN NATS USER JWT----- $ADMIN_JWT ------END NATS USER JWT------ ************************* IMPORTANT ************************* NKEY Seed printed below can be used to sign and prove identity. NKEYs are sensitive and should be treated as secrets. -----BEGIN USER NKEY SEED----- $ADMIN_NKEY ------END USER NKEY SEED------ ************************************************************* EOF chmod 600 ./nats-admin.creds -
Use the NATS CLI:
nats --server "$NATS_HOST" --creds ./nats-admin.creds stream info pivot_main nats --server "$NATS_HOST" --creds ./nats-admin.creds stream ls -
Clean up the creds file when you are done:
rm -f ./nats-admin.creds
Connect with service credentials
If you need to test service-scoped permissions, replace the admin SSM paths with service-specific ones:
SERVICE_NAME="messenger"
SERVICE_JWT=$(aws ssm get-parameter \
--region "$AWS_REGION" \
--name "$SSM_PREFIX/services/$SERVICE_NAME/nats_admin_jwt" \
--with-decryption \
--query 'Parameter.Value' \
--output text)
SERVICE_NKEY=$(aws ssm get-parameter \
--region "$AWS_REGION" \
--name "$SSM_PREFIX/services/$SERVICE_NAME/nats_admin_nkey_seed" \
--with-decryption \
--query 'Parameter.Value' \
--output text)Build a creds file using the same format as above and then connect with:
nats --server "$NATS_HOST" --creds ./nats-service.creds sub "messenger.>"Health checks
Once connected to the Cloudflare tunnel, you can check the monitoring endpoint:
curl http://nats-prod-node-1.nats-prod.internal:8222/healthzTroubleshooting bootstrap and Axiom logging
If EC2 instances exist but the nats-logs dataset is empty, check node
bootstrap status first. Most failures happen in cloud-init before NATS is fully
configured.
-
Verify cloud-init completed on each node:
sudo cloud-init status --long sudo journalctl -u cloud-init -u cloud-final -b --no-pager -
Check NATS and monitor service state:
sudo systemctl status nats.service nats-monitor.service --no-pager sudo journalctl -u nats.service -u nats-monitor.service -b --no-pager -
Check Fluent Bit status and delivery errors:
sudo systemctl status fluent-bit.service --no-pager || sudo systemctl status td-agent-bit.service --no-pager sudo journalctl -u fluent-bit.service -b --no-pager || sudo journalctl -u td-agent-bit.service -b --no-pager -
Check explicit bootstrap lifecycle events (
nats-inittag):sudo journalctl -t nats-init -b --no-pager -
Validate Fluent Bit output config points to the expected Axiom dataset:
sudo rg "Host|URI|Authorization|Systemd_Filter" /etc/fluent-bit/fluent-bit.conf