With AWS Step Functions, you can manage components of your distributed application as a state machine-like workflow. Each component becomes a step in the communication flow, passing results from one step to the next. Each step has the ability to catch errors and has built in retries that can be customized. In the past year, the different services that Step Functions can be directly integrated with has been expanded and now includes ECS or Fargate tasks. This means short-lived tasks in your distributed application workflow can be controlled and coordinated via Step Functions. Let’s take a closer look at this integration.

Step Function Definition

{
    "StartAt": "customer_data_upload",
    "States": {
        "customer_data_upload": {
            "Type": "Task",
            "Resource": "arn:aws:states:::ecs:runTask.sync",
            "Parameters": {
                "LaunchType": "FARGATE",
                "Cluster": "arn:aws:ecs:us-east-1:123456789012:cluster/development",
                "TaskDefinition": "arn:aws:ecs:us-east-1:123456789012:task-definition/customer_data_upload:1",
                "NetworkConfiguration": {
                    "AwsvpcConfiguration": {
                        "Subnets": ["subnet-0ed41e45b216c45a41"],
                        "SecurityGroups": ["sg-0121e456c45b21f42"],
                        "AssignPublicIp": "ENABLED"
                    }
                },
                "Overrides": {
                    "ContainerOverrides": [{
                        "Name": "customer-data-upload",
                        "Environment": [{
                            "Name": "CUSTOMER_ID",
                            "Value.$": "$.customer_id"
                        }, {
                            "Name": "REPORT_ID",
                            "Value.$": "$.report_id"
                        }, {
                            "Name": "OUTPUT_LOCATION",
                            "Value.$": "$.output_location"
                        }]
                    }]
                }
            },
            "Retry": [{
                "ErrorEquals": ["States.TaskFailed"],
                "IntervalSeconds": 3,
                "MaxAttempts": 2,
                "BackoffRate": 1.5
            }],
            "End": true
        }
    }
}

When initiating an ECS or Fargate task, there are a number of parameters that are supported by step Functions for defining the task. Let’s take a look at how we can leverage these parameters to pass data from the calling Step Function to the task.

Data In

To call an ECS or Fargate task from a Step Function, a Task state needs to be defined. Task states have a number of fields that can be used to configure and define for each step in the Step Function. Included in these fields is the Parameters field. We can leverage the Parameters field to pass variables to an ECS or Fargate task. Specifically, the Overrides parameter.

"Overrides": {
    "ContainerOverrides": [{
        "Name": "customer-data-upload",
        "Environment": [{
            "Name": "CUSTOMER_ID",
            "Value.$": "$.customer_id"
        }, {
            "Name": "REPORT_ID",
            "Value.$": "$.report_id"
        }, {
            "Name": "OUTPUT_LOCATION",
            "Value.$": "$.output_location"
        }]
    }]
}

The Overrides parameter contains a list of ContainerOverrides. Of the different container overrides available, we’ll need to leverage the Name and Environment fields. The Name field is required and should be populated with the name of the container we are specifying to pass variables to. The Environment field is what we’ll use to pass variables from the Step Function to our container that will have the executed code. These variables will be passed to the container as environment variables which can be accessed by our code.

Data Out

ECS or Fargate tasks are unable to return output values to the calling Step Functions. If the following steps in the function require information outputted by the ECS or Fargate task, Amazon S3 can be used to store the results of the task. The steps that follow can then look up those results in S3 and use them to continue the distributed application workflow. In the example above, we are passing an output location to the task. This inverses the control of declaring that output location making it so that the definition of location resides outside of the task and in the step function so all steps will have access to the address of where the output values will be stored.

Exception Handling

"Retry": [{
    "ErrorEquals": ["States.TaskFailed"],
    "IntervalSeconds": 3,
    "MaxAttempts": 2,
    "BackoffRate": 1.5
}]

One of the benefits of Step Functions is the ability to handle errors thrown in each step of the function. Each step can have logic to handle these thrown exceptions and apply retry logic or notify users using AWS CloudWatch. However, unlike AWS Lambdas, where an exception can be returned to the calling Step Function, ECS or Fargate tasks initiated from a Step Function are unable to return information of the exceptions encountered during run-time. Instead, if an unhandled exception occurs, the calling Step Function receives a States.TaskFailed error. Because of this, if you need to handle an exception with custom logic, you’ll want to handle these exceptions internally in the ECS or Fargate task. Any exceptions that do not need to be handled with custom logic, can be raised to the calling Step Function where it can be handled under the blanket States.TaskFailed error.

This has been a quick look at how Step Functions integrate with ECS and Fargate tasks. This integration can provide a more robust and customizable solution than what can be done with Lambda functions while still remaining serverless. Have you used Amazon Step Functions as part of a distributed application? What do you think about the integration offered by Amazon with their different services?