Many of our clients have existing Spring Boot applications and teams who have built up a lot of expertise using Spring over the years. For a variety of reasons, they’re looking to adopt a serverless architecture using this starting context. If you’re migrating an existing application that is relatively small and isn’t heavily trafficked, you may choose to run your application in a Lambda rather than as a container. AWS provides the AWS Serverless Java Container library and a quickstart guide to making that transition. You simply package your application as a fat jar, exclude the embedded Tomcat container, and provide a handler that loads your application and proxies events to it.

A common reason you’ll hear not to use a framework like Spring in the context of a Lambda Function is because of execution time, and more specifically what’s known as the cold start time. Behind the scenes, the first time a Lambda is invoked, AWS creates a container to service the initial and subsequent requests routed to that instance. This includes copying down the binaries, starting up the JVM for Java Lambdas, and any initialization done in the Lambda itself (which we know for a Spring Boot Application isn’t uncommon to take several seconds). The container then stays around for a period of time that isn’t clearly documented by AWS but I’ve read that, in others’ experience, depending on the runtime this could be as long as 45 minutes. If for that period of time, your Lambda function is not invoked at all, the container is destroyed. The next time the function is invoked, AWS needs to go through the process of creating a container again, incurring another cold start.

Knowing this, it was surprising to me that AWS even offered the AWS Serverless Java Container library because I would have expected cold start times for a Spring Boot application to be unreasonable for a serverless API. To understand this better, I created two functions that serve the same purpose: one that leverages Spring Boot and follows the quick start guide, and one that doesn’t use either.

I’ll walk you through the comparison I made because the process itself is generally applicable if you want to:

  • Determine which memory setting provides the lowest cost and/or best performance for your function

  • Decide whether you want to take on a large (in terms of package size) dependency to gain better performance or ease the development process

  • Compare the cost and performance of two different runtimes for solving the same problem if you work on a polyglot team

Using AWS’s guide to deploying a Spring Boot application as a Lambda, I wrote a simple application containing a single controller:

package io.nuvalence.samples.messages;

import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.bind.annotation.RequestMapping;

@RestController
public class MessageController {

   @RequestMapping("/hello")
   public MessageModel sayHello() {
       return MessageModel.forMessage("Hello!");
   }

   @RequestMapping("/goodbye")
   public MessageModel sayGoodbye() {
       return MessageModel.forMessage("Goodbye!");
   }

   public static class MessageModel {
       private String message;

       private static MessageModel forMessage(String message) {
           MessageModel response = new MessageModel();
           response.setMessage(message);
           return response;
       }

       public String getMessage() {
           return message;
       }

       public void setMessage(String message) {
           this.message = message;
       }
   }
}

For the sake of comparison, I also wrote a lambda handler that did the same without using Spring:

package io.nuvalence.samples.hello;

import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestHandler;
import com.amazonaws.services.lambda.runtime.events.APIGatewayV2ProxyRequestEvent;
import com.amazonaws.services.lambda.runtime.events.APIGatewayV2ProxyResponseEvent;
import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.ObjectMapper;

public class HelloLambdaHandler implements RequestHandler<APIGatewayV2ProxyRequestEvent, APIGatewayV2ProxyResponseEvent> {

   private ObjectMapper mapper = new ObjectMapper();

   @Override
   public APIGatewayV2ProxyResponseEvent handleRequest(APIGatewayV2ProxyRequestEvent input, Context context) {
       APIGatewayV2ProxyResponseEvent response = new APIGatewayV2ProxyResponseEvent();

       MessageModel messageBody;
       if (input.getPath().equals("/hello")) {
           messageBody = MessageModel.forMessage("Hello!");
       } else if (input.getPath().equals("/goodbye")) {
           messageBody = MessageModel.forMessage("Goodbye!");
       } else {
           response.setStatusCode(404);
           return response;
       }

       try {
           response.setBody(mapper.writeValueAsString(messageBody));
           response.setStatusCode(200);
       } catch (JsonProcessingException e) {
           System.out.println("could not serialize message");
           response.setStatusCode(500);
       }
       return response;
   }

   public static class MessageModel {
       private String message;

       private static MessageModel forMessage(String message) {
           MessageModel response = new MessageModel();
           response.setMessage(message);
           return response;
       }

       public String getMessage() {
           return message;
       }

       public void setMessage(String message) {
           this.message = message;
       }
   }
}

To compare the performance of the two approaches, I set up a cron job to invoke each function with a test event twice (serially) every hour using the AWS Lambda CLI. I chose that schedule because I wanted to make sure I saw both cold start and warm invocation times.

The first metric I collected was the duration of the lambda invocation from the perspective of the client. It includes, for a cold start, the amount of time it takes for AWS Lambda to set up the container to service the request.

Client-Duration.png
Client-Duration.png
Lambda-Duration.png
Lambda-Duration.png

The left side of this graph is aligned with my original expectation, the response time a user experiences on a cold start of a Lambda function using spring is nearly eight seconds, quadruple that of the comparable function that does not leverage Spring.

To understand the implications of this from a cost perspective, it’s important to understand pricing and scaling for Lambda Functions. Lambda is priced by the number of requests and total compute measured in GB-seconds (memory * compute time), where it rounds compute time up in buckets of 100ms.I collected similar metrics on the duration of the Lambda function itself:

Screen Shot 2021-03-10 at 6.26.13 PM.png
Screen Shot 2021-03-10 at 6.26.13 PM.png

From a billing perspective, a cold start for the function that uses Spring Boot will cost about 40% more than a cold start for the function without; but after that, the costs will be the same. Furthermore, because the cost of each individual request is so small, assuming the ratio of cold starts to total invocations is minimized, the cost difference becomes negligible. There could be two reasons that you have a high ratio of cold starts to total invocations.

  1. Predictable evenly distributed invocations – If you have a function invoked exactly once an hour like the one I showed above, you’re guaranteed to hit a cold start every time it is invoked. You can solve this by setting a CloudWatch Event to trigger the Lambda at a more frequent interval. If your function has a side effect or regularly takes longer than 100 ms, you may want to update it to support some keep-alive event for this sole purpose. Depending on the duration of a cold start and how frequently your function is invoked, you may find that while keeping it alive results in more requests, it’s overall less expensive and more responsive.

  2. Unpredictable spikes in invocations – If your application has relatively low traffic but sometimes receives many concurrent requests, AWS will scale up the number of containers to service all of these requests. In this case, if you know your response time is significantly lower than your cold start time, you may find that limiting the concurrency on your function results in both faster response times and fewer cold starts.

If you find yourself in one (or both) of these scenarios, depending on your team and application, the best path forward for you now may be to apply these precautions and move forward migrating your application to run in a Lambda Function. Spring provides a lot of conveniences to you as a developer that you can begin to see even in the small code snippets I showed here. Developers don’t take losing such conveniences lightly, especially if there’s no substantial cost or performance benefit.

However, as your application grows to contain many controllers and services and brings on additional dependencies, the cold start time will creep up. You’ll inevitably hit a threshold where hitting a cold start time could result in an unacceptably slow user experience or timeouts for tasks that take a few seconds on their own. The application I tested only had two simple endpoints, but your application may already be too large to deploy as a Lambda Function. If you’re already at that point, you may want to check out Erik’s post about decomposing a monolith, and stay tuned here for future posts on designing a cloud-native serverless API.

In the meantime, we’re interested in knowing what comparisons you’ve done with Lambda in the name of deciding the best configuration for cost and performance. What comparisons are you still looking to do?