Nats Self Hosted

Self-hosted NATS (JetStream) runbook

This runbook covers how to connect to the self-hosted NATS cluster with the NATS CLI from your local machine.

Prerequisites

  • Cloudflare WARP is installed and connected to the correct environment network. See Cloudflare Tunnels.

  • AWS CLI configured with access to the target account.

  • NATS CLI installed:

    brew install nats-io/nats-tools/nats

Connect with admin credentials

  1. Export the cluster name and region (SERVICE_NAME can be any configured service; it is only used to read the shared NATS host list):

    export AWS_REGION="us-east-2"
    export CLUSTER_NAME="nats-prod"
    export SERVICE_NAME="facebox"
    export SSM_PREFIX="/nats-cluster/${CLUSTER_NAME}"
  2. Fetch the connection string and admin creds from SSM:

    NATS_HOST=$(aws ssm get-parameter \
      --region "$AWS_REGION" \
      --name "$SSM_PREFIX/services/$SERVICE_NAME/nats_host" \
      --query 'Parameter.Value' \
      --output text)
     
    ADMIN_JWT=$(aws ssm get-parameter \
      --region "$AWS_REGION" \
      --name "$SSM_PREFIX/admin/nats_admin_jwt" \
      --with-decryption \
      --query 'Parameter.Value' \
      --output text)
     
    ADMIN_NKEY=$(aws ssm get-parameter \
      --region "$AWS_REGION" \
      --name "$SSM_PREFIX/admin/nats_admin_nkey_seed" \
      --with-decryption \
      --query 'Parameter.Value' \
      --output text)
  3. Create a local creds file:

    cat > ./nats-admin.creds <<EOF
    -----BEGIN NATS USER JWT-----
    $ADMIN_JWT
    ------END NATS USER JWT------
     
    ************************* IMPORTANT *************************
    NKEY Seed printed below can be used to sign and prove identity.
    NKEYs are sensitive and should be treated as secrets.
     
    -----BEGIN USER NKEY SEED-----
    $ADMIN_NKEY
    ------END USER NKEY SEED------
     
    *************************************************************
    EOF
     
    chmod 600 ./nats-admin.creds
  4. Use the NATS CLI:

    nats --server "$NATS_HOST" --creds ./nats-admin.creds stream info pivot_main
    nats --server "$NATS_HOST" --creds ./nats-admin.creds stream ls
  5. Clean up the creds file when you are done:

    rm -f ./nats-admin.creds

Connect with service credentials

If you need to test service-scoped permissions, replace the admin SSM paths with service-specific ones:

SERVICE_NAME="messenger"
 
SERVICE_JWT=$(aws ssm get-parameter \
  --region "$AWS_REGION" \
  --name "$SSM_PREFIX/services/$SERVICE_NAME/nats_admin_jwt" \
  --with-decryption \
  --query 'Parameter.Value' \
  --output text)
 
SERVICE_NKEY=$(aws ssm get-parameter \
  --region "$AWS_REGION" \
  --name "$SSM_PREFIX/services/$SERVICE_NAME/nats_admin_nkey_seed" \
  --with-decryption \
  --query 'Parameter.Value' \
  --output text)

Build a creds file using the same format as above and then connect with:

nats --server "$NATS_HOST" --creds ./nats-service.creds sub "messenger.>"

Health checks

Once connected to the Cloudflare tunnel, you can check the monitoring endpoint:

curl http://nats-prod-node-1.nats-prod.internal:8222/healthz

Troubleshooting bootstrap and Axiom logging

If EC2 instances exist but the nats-logs dataset is empty, check node bootstrap status first. Most failures happen in cloud-init before NATS is fully configured.

  1. Verify cloud-init completed on each node:

    sudo cloud-init status --long
    sudo journalctl -u cloud-init -u cloud-final -b --no-pager
  2. Check NATS and monitor service state:

    sudo systemctl status nats.service nats-monitor.service --no-pager
    sudo journalctl -u nats.service -u nats-monitor.service -b --no-pager
  3. Check Fluent Bit status and delivery errors:

    sudo systemctl status fluent-bit.service --no-pager || sudo systemctl status td-agent-bit.service --no-pager
    sudo journalctl -u fluent-bit.service -b --no-pager || sudo journalctl -u td-agent-bit.service -b --no-pager
  4. Check explicit bootstrap lifecycle events (nats-init tag):

    sudo journalctl -t nats-init -b --no-pager
  5. Validate Fluent Bit output config points to the expected Axiom dataset:

    sudo rg "Host|URI|Authorization|Systemd_Filter" /etc/fluent-bit/fluent-bit.conf