Bad Deploy Rollback
ECS Service Rollback Procedure
This document outlines the step-by-step process for rolling back an ECS service to a previous version using the AWS Console.
Prerequisites
- AWS Console access with appropriate ECS permissions (only certain engineers have this access)
- Knowledge of which task definition revision you want to roll back to
Rollback Procedure
Step 1: Create New Task Definition Revision
- Navigate to Amazon ECS in the AWS Console
- Click on "Task Definitions" in the left sidebar
- Find and click on your task definition family
- Locate the previous (target) revision you want to roll back to
- Select this revision by clicking on its revision number
- Click the "Create new revision" button at the top of the page
- Important: Do NOT modify any settings on the "Create new revision" page
- Click "Create" to create a new ACTIVE revision identical to your target revision
Step 2: Update the Service
- Navigate to "Clusters" in the left sidebar
- Select your cluster
- Click on the "Services" tab
- Select your service
- Click the "Update" button
- In the "Deployment configuration" section:
- Deployment type: Rolling update
- Task Definition: Select the new revision you just created
- Review other settings but do not modify them unless specifically required
- Click "Update" to start the rollback deployment
Step 3: Monitor the Rollback
- Stay on the service details page
- Monitor the "Deployments" tab:
- Watch for the new deployment to reach "Primary" status
- Ensure old tasks are being replaced with new ones
- Check the "Events" tab for any deployment issues
- Verify application health through your monitoring systems
Troubleshooting
Common Issues
-
Service Stuck in Deployment
- Check service events for specific errors
- Review service logs in Axiom
-
Tasks Failing to Start
- Check task definition compatibility
- Review Axiom for application errors
- Verify resource availability in the cluster
Rollback Failure Recovery
If the rollback deployment itself fails:
- Do not panic - your original tasks will continue running
- Check Axiom for error messages
- Verify task definition configuration
Important Notes
- This procedure creates a new task definition revision
- The previous revisions remain in your task definition history
- Consider documenting the reason for rollback in your incident management system
- Update monitoring thresholds if the older version has different performance characteristics
Next Steps
After successful rollback:
- Investigate the root cause of the issue that required rollback
- Document the incident and resolution
- Update deployment procedures if necessary
- Plan for re-deployment of fixed version
Frontend Application Rollback Procedure
Rollback button on Cloudflare dashboard.