Runbooks
Bad Deploy Rollback

Bad Deploy Rollback

ECS Service Rollback Procedure

This document outlines the step-by-step process for rolling back an ECS service to a previous version using the AWS Console.

Prerequisites

  • AWS Console access with appropriate ECS permissions (only certain engineers have this access)
  • Knowledge of which task definition revision you want to roll back to

Rollback Procedure

Step 1: Create New Task Definition Revision

  1. Navigate to Amazon ECS in the AWS Console
  2. Click on "Task Definitions" in the left sidebar
  3. Find and click on your task definition family
  4. Locate the previous (target) revision you want to roll back to
  5. Select this revision by clicking on its revision number
  6. Click the "Create new revision" button at the top of the page
  7. Important: Do NOT modify any settings on the "Create new revision" page
  8. Click "Create" to create a new ACTIVE revision identical to your target revision

Step 2: Update the Service

  1. Navigate to "Clusters" in the left sidebar
  2. Select your cluster
  3. Click on the "Services" tab
  4. Select your service
  5. Click the "Update" button
  6. In the "Deployment configuration" section:
    • Deployment type: Rolling update
    • Task Definition: Select the new revision you just created
  7. Review other settings but do not modify them unless specifically required
  8. Click "Update" to start the rollback deployment

Step 3: Monitor the Rollback

  1. Stay on the service details page
  2. Monitor the "Deployments" tab:
    • Watch for the new deployment to reach "Primary" status
    • Ensure old tasks are being replaced with new ones
  3. Check the "Events" tab for any deployment issues
  4. Verify application health through your monitoring systems

Troubleshooting

Common Issues

  1. Service Stuck in Deployment

    • Check service events for specific errors
    • Review service logs in Axiom
  2. Tasks Failing to Start

    • Check task definition compatibility
    • Review Axiom for application errors
    • Verify resource availability in the cluster

Rollback Failure Recovery

If the rollback deployment itself fails:

  1. Do not panic - your original tasks will continue running
  2. Check Axiom for error messages
  3. Verify task definition configuration

Important Notes

  • This procedure creates a new task definition revision
  • The previous revisions remain in your task definition history
  • Consider documenting the reason for rollback in your incident management system
  • Update monitoring thresholds if the older version has different performance characteristics

Next Steps

After successful rollback:

  1. Investigate the root cause of the issue that required rollback
  2. Document the incident and resolution
  3. Update deployment procedures if necessary
  4. Plan for re-deployment of fixed version

Frontend Application Rollback Procedure

Rollback button on Cloudflare dashboard.