Best Practices for Running Serverless Code in AWS Lambda-Part 2

In Blog by Rich Veliz

We discussed in part 1 of this article series that running serverless code in AWS Lambda can be incredibly easy, but that the challenge is doing so in an effective and efficient way. This article will build upon the first article and present a few more of the most important architecture, development and operations best practices to help meet this challenge.

The code accompanying this article can be found in the following GitHub repo lambda-sqs-dynamodb-sample.

Extract and organize common functionality to libraries that can be shared among multiple Lambda functions

Just as you rely on dependencies created by others for your Lambda functions, you should also organize your common code into dependencies that you can then reference from your Lambda functions. This may seem like an obvious step but it is often skipped when creating Lambda functions since they tend to be smaller and quicker to develop than traditional applications and services.

Unless you organize your common code this way, you’ll likely end up including the same or similar pieces of code into your various Lambda functions. This approach is an antipattern that will result in your organization accumulating lots of tech debt. Any changes to the common functionality will require changes to the code in all of your Lambda functions.

In Java you can organize your common code into one or more jars that you then publish to an artifact repository such as Maven Central or JFrog. Once published to a repository, you can then add your common libraries to your Lambda function as dependencies just like any third party dependency. A similar approach can be used with other programming language.

Dependencies can also be deployed as a Lambda Layer. You can specify up to 5 layers for a Lambda function so you would need to organize common dependencies into a small number of layers to operate within this limitation. For Java functions, you would specify the dependency in your build file at the provided scope which is intended for dependencies that will be provided by the runtime. This way your code will build locally but the dependencies deployed through Lambda Layers will not be included in the package used for your Lambda function.

Pack only the dependencies you really need

Don’t plan like that vacationer that brings three full size pieces of luggage for a 5 day trip. Think more like the person that brings just one carry on bag, packing only the essentials. Understand each dependency that you’re including in your project and it’s specific purpose. Often times there is a lighter weight option to accomplish what you need to do. For example, don’t include the entire AWS SDK as a dependency, instead include only the specific components you need. Adding the entire AWS SDK for Java as shown in the build file snippet below will increase your package size by a couple hundred MB.

A better approach, if you’re serverless function will need to access DynamoDB and Parameter Store, would be to include the following in your build file:

It’s also important to analyze each dependency choice and select the best options. For example, when creating a Lambda function with Java you should select the AWS SDK for Java 2 as it is much improved over its predecessor. In addition to a modularized API that allows you to use only the specific parts of the SDK you need, the AWS SDK for Java 2 offers additional benefits like greater ease of use, nonblocking I/O and improved performance.

Use lighter frameworks that enable you to write well organized and clean code

Use frameworks that were designed to be lightweight, with serverless in mind. Avoid using Spring for your serverless function for now. Spring has evolved nicely through the years and they most surely will evolve again to better support the creation of serverless functions. Spring is a reasonable option for monoliths and for use in container services deployed to Kubernetes and AWS ECS. But for now, it’s better to stick to lighter weight frameworks like Dagger, Micronaut and Quarkus for creating serverless functions in Java.

These newer frameworks are much leaner and have a much lower memory footprint than Spring Boot. Furthermore, they rely on compile time dependency injection and aspect oriented programming, not on reflection resulting in much quicker startup times and much greater runtime performance. This can help with Lambda cold starts and warm invocations as well. We will write a separate article about optimizing Java Lambda functions where we’ll cover this in detail.

If you absolutely must use Spring Boot, you can expect larger cold start delays and higher costs due to your function’s increased memory needs. The large cold start delay of launching a heavy Spring Boot application can be avoided by leveraging AWS Lambda’s provisioned concurrency feature. This allows you to pre-load instances of your AWS Lambda function so they are started and ready to handle requests in advance. However, you’ll have to pay for this benefit as the provisioned concurrency pricing includes a charge for the number and size of instances that you choose to pre-load.

Select the optimum memory size and set an appropriate timeout value

Memory size for a Lambda function can be set between 128 MB and 3,008 MB in increments of 64 MB. The Lambda service allocates CPU proportionally to the memory size configured. So ,the greater amount of memory you configure, the greater share of CPU your Lambda function will have access to. A function configured with 1,792 MB has the equivalent of one full vCPU.

Setting the optimum memory size is critical as it has direct impact on how long your Lambda function will take to execute and how much you’ll be charged per 100 ms time interval that it runs for. Set the memory size too low and your Lambda function will take too long to run resulting in higher costs, a poor client experience and possible timeouts. Set the memory size to high and you will be wasting capacity with your functions executing quickly, but being charged more than necessary for the execution time.

There are some basic steps you can take to help you make a more informed decision on memory size. First, check your function’s CloudWatch logs for the Duration, Memory Size and Max Memory utilized. A basic analysis of this data can help you spot some cases of major over or under provisioning of memory for your Lambda function. This can also guide you in setting an appropriate timeout value for your function. This is critical to avoid unintended timeouts for your clients when your function takes longer to execute.

If the Max Memory Used amount is always much lower than the Memory Size you can consider lowering the configured Memory Size for your Lambda. If the Max Memory Used amount sometime approaches or equals the Memory Size you can consider raising the configured Memory Size for your Lambda. This simple approach is helpful when your process is memory bound but isn’t effective if your process is CPU or IO bound.

A more comprehensive approach to memory sizing is to run performance and load tests on your Lambda using various memory sizes and checking the results in CloudWatch metrics. This approach is effective but is time consuming and tedious. There is an open source tool available that can help with this. The AWS Lambda Power Tuning tool automates the tedious parts of testing your Lambda function using various memory sizes, allowing you to select the optimal memory size. This tool is fully configurable and save you a lot of time. It runs on AWS Step Functions so there is some setup and cost involved, although the cost is minimal.

Handle expensive resources with care

Great care must be taken to initialize expensive resources such as SDK client and database connections in an optimal way and a to reuse them across invocations when possible. These types of resources should be initialized outside of the Lambda handler which will result in them being initialized before the handler code is run. The first benefit this provides is that the resource is initialized during a pre invocation period when the Lambda function has boosted access to CPU than officially allocated to it. This results in quicker initialization of the resource than if it is done in the handler function. The second benefit is that the resource can be reused across invocations of the same Lambda function instance without it having to be instantiated again.

Database connections are a special case because they are a very limited resource. We saw this example of throttled connections to an RDS database in part 1 of this article series:

If you are using an non serverless RDS database you must consider the potential for exhausting the database connection capacity of your database. You have a few options to deal with this. The first is to set usage limits for your API clients at the API Gateway. Make sure that your clients are using exponential backoff to help reduce the load when your database begins to be overloaded. If you’re clients don’t need a synchronous response you can also put an SQS queue between the API Gateway and the Lambda function to serve as a buffer and help manage peak activity.

Another option is to use a serverless database such as AWS DynamoDB or AWS Aurora Serverless. You still have to consider scaling of your Lambda especially for cost management but it should handle a much larger number of clients than a non serverless database.

If you’re not able to switch to a serverless database you can opt to use the new RDS Proxy which makes RDS based applications more secure, scalable, and resilient to failure. RDS Proxy is a fully managed AWS service that allows applications to share database connections through connection pools. RDS Proxy helps to reduce the inherent impedance mismatch between serverless Lambda functions and the non serverless RDS databases. There is a cost for the RDS Proxy service but in a serverless environment it should be more than offset by the benefits of increased efficiency of your RDS Database and increased security and scalability of your application.

Use env variables to pass operational parameters to your function

For any real world application it’s critical to use environment variables to pass operational parameters to your code. This allows you to modify your function’s behavior without updating the code.

AWS Lambda offers an environment variables feature that allow you to do just that. The Lambda service makes these variables available to your function code at runtime. A set of environment variable values are tied to each version of your Lambda function and can’t be changed once a version has been published. This may be sufficient for simple applications but limits their usefulness in real world applications.

For more complex cases you are likely better off storing your environment variables separate from your Lambda function. AWS Systems Manager Parameter Store supports this approach perfectly, providing secure storage for configuration data and secrets. Note that additional work is required to setup your configuration settings in Parameter Store and additional code is required to read the settings. Separating your configuration settings from your code offers many advantages including the ability to update your configuration, adjusting the behavior of your function without redeploying code. You can also share sets of variables with any number of Lambda functions. Integration with AWS KMS allows you to store secrets securely in AWS Parameter Store as well.

The use of AWS Secrets Manager is recommended for cases where the lifecycle of a secret must be managed, for example when a database password must be rotated on a set schedule.

Apply the principle of least privilege

A required step for creating a Lambda function in AWS is to create or assign a role which your function will assume to perform actions in your AWS account. In creating a role, you will set the trusted entity type as Lambda and grant the necessary permissions. In granting the permissions you have the option of adding policies that already exist or of creating your own inline policies.

AWS provides a wide selection of managed policies that range from providing minimal access to providing full access to AWS services. AWSLambdaBasicExecutionRole can be a good starting point as it grants the ability for your Lambda function to create a log group, log stream and to write log events in AWS CloudWatch. Below are the actions included in this role:

  "Action": [
    "logs:CreateLogGroup",
    "logs:CreateLogStream",
    "logs:PutLogEvents"
  ]

This is sufficient to create a basic Hello World type Lambda function but won’t let you interact with any other AWS services. To get data from Dynamo DB for an API the easiest option would be to add the AWS managed policy AmazonDynamoDBFullAccess. However, this managed policy provides a very high level of access to the DynamoDB service including all DynamoDB resources in your AWS account. This is potentially disastrous as an intruder who gains access to your Lambda function would gain access to all of your DynamoDB resources. It’s critical to limit the blast radius by granting your Lambda function only the minimal permissions it needs to do its intended work. Depending on the intended task at hand, the following actions and resources may be sufficient for a read only API:

  {
    "Action": [
      "dynamodb:BatchGetItem",
      "dynamodb:GetItem",
      "dynamodb:Query"
    ],
    "Effect": "Allow",
    "Resource": "*"
  }

You should tighten the role down further by specifying the resources the dynamodb actions should apply to. The following limits your Lambda function to performing the above granted actions on your “account-detail” DynamoDB table.

  "Resource": "arn:aws:dynamodb:*:*:table/account-detail"

This process of building a custom policy to grant the least restrictive permissions to a Lambda function can be time consuming but SAM policy templates can make it easier. For the above example instead of the custom policy you could use the DynamoDBWritePolicy template which gives read only access to a DynamoDB table.

There are policy templates available for many common uses. If you don’t see a use case you need listed you can submit a pull request at the GitHub repo to request that it be added. Below is a SAM template example that shows how you can use policy templates in your serverless applications.

Conclusion

In this article I covered a few more best practices for running code in AWS Lambda, following up on part 1 of this article series. You can find the sample code for this article in the following GitHub repo lambda-sqs-dynamodb-sample.

I can be reached on LinkedIn or Twitter for any questions about this article or to discuss your company’s AWS architecture and development needs.

Rich Veliz

Rich Veliz is the founder and principal architect / engineer at Scalable Tech, Inc. He is a results driven leader with extensive experience building top performing software engineering teams. His most recent engagements have been leading distributed engineering teams in the development of data processing and API solutions at AT&T, Cognizant and World Fuel Services.
Rich holds a Bachelor of Science in Electrical Engineering from the University of Miami and a Master of Business Administration from Florida International University. He also is an AWS Certified Developer, AWS Certified Solutions Architect Professional and AWS Certified DevOps Engineer Professional.