Running serverless code in AWS Lambda is incredibly easy, you can have sample code up and running in a couple minutes without having to provision any servers. But doing so in an effective and efficient way can be very challenging. This article will present what I think are a few of the more important architecture, development and operations best practices to help meet this challenge.
The code accompanying this article can be found in the following Github repo lambda-trigger-destinations-sample. The code is in Java but the principles we cover apply to Lambda functions written in any language.
Select the right use case for AWS Lambda
Don’t just use AWS Lambda because it’s easier or the more modern approach for running code in the cloud. Make sure you’re use case is appropriate for operating as a serverless function.
With AWS Lambda you don’t have to provision your own compute capacity. The Lambda service automatically handles capacity provisioning, instance health monitoring, instance updates, as well as deploying, monitoring, scaling and high availability for your code. This eases the burden dramatically for running code in the cloud but it also eliminates much of your ability to control and customize the environment.
For use cases when you need greater ability to control and customize the environment you have additional options to consider such as AWS EC2, EC2 Container Services (ECS) and Elastic Beanstalk. AWS EC2 allows you to provision your own compute instances where you have full control over the operating system, the entire software stack as well as network and security settings. AWS ECS is a container orchestration system similar to Kubernetes that allows you to deploy your Docker containers to easily create a distributed and scalable application. You can choose to run your ECS application using AWS Fargate which is a serverless option for ECS that eases the management burden further.
You should select the appropriate option for your use case by considering the flexibility and control needed and the level of administration burden you are willing to deal with.
It’s also important to be aware of AWS Lambda limits and how they may impact your use case. Function timeout and memory allocation limits are now large enough for most uses but be aware of payload, deployment package size and temp storage limits among others that may hinder your ability to use Lambda in specific cases.
A few example cases that may not be appropriate for AWS Lambda are 1) a financial trading application with very low minimum latency requirements, 2) a machine learning application which performs complex logic over large data sets and 3) a data processing application which needs precise control over concurrency..
Select the right programming language for you, your organization and use case
If you or your organization have a great deal of expertise in a specific language then that’s a major plus for using that language for your Lambda function. For example, if you’re a consultant doing work for a client and they are a .NET shop, then it would make sense to develop the Lambda function in C# so the development team can easily maintain it.
If your organization has an existing, well written code base in a specific language then that’s also a major plus for selecting that programming language. Make sure that the code base is written and organized in a way that makes is reusable from within your Lambda function. If your organization has proprietary libraries written in Java with very complex logic for performing industry specific financial transactions needed for this Lambda function, then it makes sense to use Java to develop the associated Lambda functions.
While performance must be considered, I don’t think it is the most important factor in selecting a programming language for Lambda. Lambda functions executing in Java and C# runtimes have much slower cold start times than Node.js or Python, but execute much faster during subsequent calls (warm invocations). Also, much can be done by the developer to reduce cold start times for Java and C#. I will cover details of how this can be done in an upcoming article and one option for mitigating this in the next best practice. Overall, there’s no clear winning language for performance in AWS Lambda and I don’t believe it should be used as the deciding factor.
Think about scaling and concurrency before you code
One of the greatest features of AWS Lambda is its ability to auto scale your Lambda function. You can start off with zero instances and depending on the request traffic and your AWS account limits your Lambda function can be scaled to thousands of instances. However, this doesn’t mean that you don’t have to think about the scalability of your service. If anything, it makes it much more important.
The auto scaling of your Lambda function can make the lack of scalability in other parts of your service more evident. It’s often the case that a system will work smoothly when traffic is low but then starts to show signs of strain as traffic reaches a critical point and which become much worse as traffic continues to increase.
A common bottleneck is the database. Below we see an example architecture which has a client making a request at an endpoint managed by API Gateway. The request results in an invocation of a Lambda function which makes a call to the RDS database. With low traffic, we have no issues as the database can satisfy the low number of requests it receives.
However, calls to the database will fail if the number of concurrent requests exhausts the available database connections. The diagram below shows the same system with 500 instances of a Lambda, with some of the requests being throttled. We will cover some possible options for making this architecture more scalable in a later best practice.
Regardless of which architecture is used, it’s important to set alarms that are triggered when your Lambda function’s concurrency exceeds planned levels. This will allow you to be aware before a potential service availability issue occurs.
Setting the reserved concurrency for a Lambda function reserves the concurrency for your function from the overall available concurrency for your AWS account. This ensures that your Lambda function can always scale to this specific concurrency level. It also sets the maximum available concurrency for that Lambda function. A Lambda function with a set Reserved Concurrency is still subject to cold starts as the instances are also started on an on demand basis.
The use of this feature is optional and do so with care as setting the value too low may result in throttling of your Lambda function. Setting the value too high may not leave adequate concurrency for the other Lambda functions in your AWS account.
Provisioned concurrency is a new feature for AWS Lambda that enables the developer to keep Lambda function instances fully initialized and ready to respond. This eliminates the cold start problem previously mentioned.
The automatic scaling that’s built into AWS Lambda works such that instances of your function are started on an as needed basis. When your Lambda function is invoked, if there are no live instances of it, the Lambda service will start one resulting in a cold start delay. The next time your Lambda function is invoked, if there is at least one live instance of it that’s not in use, it will be invoked which avoids the cold start delay. Instead, if there are no live instances or all live instances are currently in use, then a new instance will be started again, resulting in a new cold start delay. This process of starting, using and reusing your functions instances during each invocation continues as your function’s concurrency builds up. As can be expected, this will result in many cold start delays as your function scales up in concurrency.
Developers can now set the desired concurrency for a Lambda function to a static value or based on a schedule or metrics using auto scaling, without requiring any change in code. This benefits all runtimes but is most beneficial for those which have much slower initialization times such as Java and C#. With Provisioned Concurrency turned on, users of these runtimes benefit from both the consistent low latency of the function’s start-up, and the runtime’s performance during execution.
There is a cost to consider with Provisioned Concurrency for AWS Lambda. You will pay a set amount for each GB of RAM your function utilizes, how many instances you have provisioned and the amount of time they are provisioned. Since this can be a costly option, it’s best to use it only for the use cases that benefit most from it, one example being latency sensitive applications running with the Java or C# runtimes. It also makes sense to analyze traffic patterns for your function and based on that, utilize auto scaling to turn on, adjust and turn off provisioned capacity as it provides the most ROI. Provisioned concurrency costs more than standard concurrency at low utilization rates and costs less at high utilization rates.
Use AWS SAM / CloudFormation to create and manage your Lambda functions
It takes just a minute or two to setup your Lambda function in the AWS console, but let’s be clear that’s just a simple way to get familiar with the process of deploying serverless code. For deploying a Lambda function in production and on an ongoing basis the best practice is to use AWS SAM / CloudFormation.
AWS CloudFormation is a service that allows you to model and setup your AWS resources as code. You create a template that describes the AWS resources you need and their relationships to each other, and CloudFormation will setup and configure the resources for you. One major benefit is that your resources are modeled as code which can be tracked and managed in source control. Also, your deployments become automated and repeatable. This saves lots of time, reduces the potential for user error and allows you to replicate your environments for purposes of development and testing.
CloudFormation and its template format were designed mainly for managing infrastructure. Since serverless is a newer concept, the Serverless Application Model (SAM) was introduced as an extension to CloudFormation with the intent of making it easier and more effective to model and manage serverless applications using CloudFormation.
Learn about SAM the squirrel and the framework at https://aws.amazon.com/serverless/sam/.
The SAM framework also includes a command line interface (CLI) tool that makes it easier to test, develop and deploy serverless applications. The CLI uses Docker images to provide a local environment where you can run and test your Lambda functions. API Gateway style testing is supported by the CLI. The sample SAM template below shows how simple it is to create an AWS S3 bucket which is setup as an asynchronous event source for an AWS Lambda function. The onSuccess condition for the Lambda function is configured to route to an AWS SNS topic while the DeadLetterQueue for the Lambda function is configured to route to a new AWS SQS queue.
The use of SAM / CloudFormation is a best practice but it’s just a start in working with Lambda and other AWS services. It’s highly recommended to also have a full Continuous Integration / Continuous Delivery (CI/CD) process in place. CI/CD is a complex topic that needs it’s own writeup so we’ll mention just a few high level ideas here. A well designed CI/CD pipeline enables you to build and release software in a reliable, repeatble and automated fashion which ensures quality. Ideally you would have multiple environments to run your AWS resources as you progress through development, testing and release to production. Your CI/CD pipeline would enable you to deploy your resources to each of those environments. For security purposes it’s best to have each of these environments in a separate AWS account.
Minimize the amount of code you write
Take advantage of Lambda integrations with other AWS services and partner services to avoid writing unnecessary code. This will simplify and speed up your development and testing as well as reduce the opportunity for issues brought about by programmer error.
Triggers allow other AWS services to invoke your Lambda function based on events, external requests or on a schedule. Just a few of the options include API Gateway, DynamoDB, Kinesis, S3, SNS and SQS. This feature saves you from having to write code to make your Lambda function interact with the other AWS service to retrieve or receive the new events.
Lambda triggers can easily be setup in the AWS console, the AWS CLI or in a CloudFormation/SAM template. In the console, you can setup a trigger for your Lambda function by clicking the “Add trigger” button in the Lambda designer as shown below.
For example, you can choose to have your Lambda triggered each time for any object create event in a specific S3 bucket.
Lambda Destinations is a new feature that allows users to route the Lambda function request and response to other AWS services, eliminating the need for code to communicate between services. This feature works with asynchronous event sources for AWS Lambda including those shown in the diagram below (AWS SQS is not listed because it serves as a synchronous event source for Lambda). For each Lambda function a destination can be added for the onSuccess and the onFailure conditions. The destination can be another Lambda function, an SQS queue, an SNS topic or Event Bridge.
This allows you to use more sophisticated architectural patterns such as decoupling services to increase resilience without requiring additional code. An example can be seen below where AWS S3 is setup as an asynchronous event source for a Lambda function and the onSuccess condition is configured to route to another AWS Lambda function while the onFailure condition is configured to route to an SQS queue.
Lambda destinations can easily be setup in the AWS console, the AWS CLI or in a CloudFormation/SAM template. In the console, you can add a destination for your Lambda for the on failure and/or on success conditions by clicking the “Add destination” button in the Lambda designer as shown below.
For example, you can set the on failure condition to send the execution results to an SQS dead letter queue as shown below.
Dead Letter Queue
AWS also allows you to specify a SQS queue or SNS topic as a Dead Letter Queue for a Lambda function. If you specify a Dead Letter Queue for your Lambda function, the original message for any failed asynchronous Lambda invocations will be sent to this queue/topic. You can use this option instead of the onFailure condition of Lambda Destinations if you only need the original message and not the Lambda response for that message.
The asynchronous invocation settings for the Lambda function allow you to set the number of retries made on a failure to 0, 1 or 2. The Lambda service will wait one minute after the first failure and two minutes after the second failure before the next attempt. After a failure when all retries have been exhausted, the Lambda service will send the message to the Dead Letter Queue if one has been set for the function.
With the Lambda trigger, destinations and dead letter queue features, sophisticated decoupling patterns can be achieved with no additional code and minimal complexity. For example, all the integrations represented with orange connectors in the sample architecture below are achieved without writing any code.
It’s considered an anti pattern to code complex orchestration logic into a Lambda function. When a process consisting of several tasks each having success and error cases must be coordinated, it’s probably best to not do this in one Lambda function. You will have to write a significant amount of complex code to get the job done and this type of work is typically susceptible to programmatical errors. Additionally, waiting for some tasks to finish before starting others will cause your Lambda function to take longer to execute which results in greater charges and possible timeouts
Instead you can decompose the complex logic into several Lambda functions and decouple them by using SQS queues for communication. This reduces the complexity of the Lambda function and increases resilience of the process. However, you can reduce the complexity and code required further by employing AWS Step Functions.
Step Functions allow you to easily create a resilient workflow by coordinating several AWS services including your Lambda functions. You create a workflow by stitching together a series of steps, with the output of each step serving as input to the next. Your workflow is viewable as a state machine diagram which makes it easy to design, to update and to understand. The Step Functions service automatically triggers each step, retrying when errors occur. You are able to effectively track the progress of each workflow instance in the AWS console, easily spotting any errors that occur.
The benefits of using Step Functions include quicker and easier development of applications/services, a significant reduction in the amount of code to write and increased resilience. The example below shows how some simple state machine definition code can become a resilient workflow.
There is a significant operating cost to consider when using Step Functions, but for a use case that requires significant orchestration of tasks, it may be worth it as the use of Step Functions eliminates the need to employ multiple queues, each of which would need to be polled.
In this article I covered a few best practices for running code in AWS Lambda. You can find the sample code for this article in the following GitHub repo lambda-trigger-destinations-sample. I’ll be publishing more best practices soon in the follow-up, part 2, of this topic.