AWS Lambda Overview

Introduction

In this post, we’ll delve into AWS Lambda, AWS's serverless computing platform. My aim is to give a broad overview over it, shed light on its nuances, and to create an easier path for other developers.

Understanding AWS Lambda

AWS Lambda is an event-based computing service. Events could be triggered through an API access, a scheduler like EventBridge, through AWS's API using a library like boto3 , through its integration with RDS and through many other services within AWS's ecosystem.

It can perform mid-sized asynchronous jobs, spanning through several minutes and using sizeable memory, as well as synchronous fast jobs, like transforming a CloudFront request or response.

Such flexibility also comes with some constraints and it usually requires a different approach to application development — one that is specifically tailored for its environment.

Namely, a developer expecting to deploy on Lambda should consider:

Cost
Running environment
Lifecycle
Time constraints
Network constraints
Resource access permissions

We'll approach each of these topics below.

Cost

The cost for AWS Lambda is primarily based on:

the number of requests you serve (US$0.20 per 1M requests)*
the amount of memory reserved times the number of seconds the function runs (~ US$0.000015 per GB-second)*

\ Prices for US East as of Jan '24*

And on that, you get allotted 1 vCPU for each 1569MB of memory, or fractional values (throttled on CPU time) for non-integer ratios of that.

This billing model is different from traditional availability-based pricing and requires a strategic approach to efficiency and execution time.

Do you expect your function to hang, waiting for some network answer while doing no work? Maybe Lambda isn't the best platform for it.

On the other hand, the same problem can be solved in different manners. And taking into account the environment where it's running can guide you towards the best way of adapting to it.

Take this task for example: download and unpack the Linux kernel.

If you're tight on memory, tight on CPU and loose on storage, this could be one way to do it:

import os
import tarfile
from tempfile import TemporaryDirectory
from urllib.request import urlopen

url = "https://cdn.kernel.org/pub/linux/kernel/v6.x/linux-6.7.tar.gz"

def lambda_handler(event: dict, context: object) -> None:
    with TemporaryDirectory() as temp_dir:
        local_file_path = os.path.join(temp_dir, "linux-6.7.tar.gz")

        # Download the file and save it to disk
        with urlopen(url) as response:
            with open(local_file_path, "wb") as f:
                while (chunk := response.read(1024)):
                    f.write(chunk)

        # Extract the file
        with tarfile.open(name=local_file_path, mode="r:gz") as tar:
            tar.extractall(path=temp_dir)

You open the file for download, read small chunks and write small chunks until the download is complete; then you unpack the file you just saved.

But if you're loose on memory, tight on CPU and tight on storage, a different way to achieve the goal could be:

import tarfile
from io import BytesIO
from tempfile import TemporaryDirectory
from urllib.request import urlopen

url = "https://cdn.kernel.org/pub/linux/kernel/v6.x/linux-6.7.tar.gz"

def lambda_handler(event: dict, context: object) -> None:
    with TemporaryDirectory() as temp_dir:
        # Download the file into memory
        with urlopen(url) as response:
            file_content = BytesIO(response.read())

        # Extract the file
        with tarfile.open(fileobj=file_content, mode="r:gz") as tar:
            tar.extractall(path=temp_dir)

This way, you first download all the file into memory, then you unpack it from memory into disk.

But if you're trying to minimize the time*memory it takes for the function to run, given the allocated resources, this could be a better solution:

import tarfile
from tempfile import TemporaryDirectory
from urllib.request import urlopen

url = "https://cdn.kernel.org/pub/linux/kernel/v6.x/linux-6.7.tar.gz"

def lambda_handler(event: dict, context: object) -> None:
    with TemporaryDirectory() as temp_dir:
        # Start file download
        with urlopen(url) as response:
            # Extract it while you download it
            with tarfile.open(fileobj=response, mode="r:gz") as tar:
                tar.extractall(path=temp_dir)

In this case, while you wait for the network buffer to fill up while the bytes come in from the network, you get going by using your allotted CPU time to unpack the file.

For reference, these are the empirical execution times and costs of the three versions on my laptop and with different amounts of memory on Lambda:

Method A: Writing to Disk in Small Chunks

Method B: Downloading into Memory then Unpacking

Method C: Unpacking while Downloading

Notice how the performance of each method is not only influenced by the available memory but also by the proportionate CPU power. For example, at 1569MB, where a full vCPU is available, the methods show improved performance compared to lower memory allocations with fractional vCPUs.

Given the pricing model of AWS Lambda, where both memory allocation and execution time contribute to the cost, selecting a method that efficiently uses both CPU and memory can lead to substantial cost savings, especially for applications that are scaled up to handle a large number of requests.

Understanding the relationship between memory allocation and CPU throttling is crucial in AWS Lambda. It impacts not only the feasibility of different approaches under various resource constraints but also the overall efficiency in terms of performance and cost.

In this small example methods A and Care less sensitive to file size, allowing the function to run even in the most constrained configuration. Meanwhile methods B and C showed about the same time and cost profile. This way, method C’s reliability regarding file size and memory requirements matched with its cost performance makes it a potentially more adequate method for this environment.

Running Environment

Lambda supports multiple programming languages through the use of runtimes.

The easiest runtime to use is the one provided by AWS. The underlying execution environment is an Amazon Linux distribution with a programming language interpreter selected by the user and some additional libraries made available by AWS. At the time when this article is written, this option is available for Node, Python, Java, .NET and Ruby. This option can be used by either editing the code file directly from the AWS console or uploading a ZIP file with the code — potentially including third-party libraries.

If more control is needed, a runtime can be custom-made by the user through container images. The containers can be built out of three ways (in order of least to most complex or less to more freedom):

based on base images provided by AWS;
using an AWS-supplied Lambda runtime interface client
implement the Lambda Runtime API

The image, containing both the environment and the application, can be then pushed to the Elastic Container Registry and selected as the code to he ran.

Lambda functions operate within a stateless, isolated environment, where each function execution is distinct and has restricted filesystem access.

The execution environment is constrained by resource limits, including a maximum of 10GB of RAM and 10GB of ephemeral storage (to be configured by the user and billed accordingly), and a 15-minute execution time cap, after which AWS will forcibly terminate the function.

The filesystem is essentially read-only, with temporary files stored in the /tmp directory. Lambda may re-use the execution environment from a previous invocation if one is available, or it can create a new execution environment, meaning that while data and state may persist during a function’s reactivation from hibernation, it could also be completely erased during a cold start.

This environment, designed for short-lived, independent operations, mandates an approach centered on statelessness and autonomy.

Lifecycle

Due to the serverless pay-per-use pricing structure, your function isn’t always running on AWS servers. Instead, it's deployed only when needed. Because of this, there may be a longer response time when the function is called: on the first invocation, the code is transferred, an execution environment is established and the function is called. After this, it maintains the execution environment for a certain duration. If you invoke your function again in this period, AWS reuses the existing environment, skipping the initialization phase. It should be noted, though, that you're only billed when the code is actually running (either on initialization or invocation); when the environment is dormant, waiting for a new invocation, no costs are incurred.

Lambda execution environment lifecycle

If another invocation happens while the already established environment is busy, AWS needs to span a new environment for it. Unlike a server that runs continuously and may serve multiple requests in parallel, each execution environment serves only one request at a time. While this may sound like a disadvantage, this is actually a reason to use Lambda: this model allows AWS to run as many instances as you need on demand, instantiating them as needed.

It used to be that, the latency involved in these cold starts was a significant hurdle, limiting Lambda’s applicability in time-sensitive scenarios. However, advancements in AWS Lambda technology have greatly reduced these start-up times, even in cold start situations, usually bringing them down to just a few seconds.

This improvement did expand the range of viable applications for Lambda, as long as the application architecture and function design took into account the different startup behaviors and planned accordingly.

Network Constraints

Lambda functions can run with or without network access. To access any network resources, Lambdas must run on a private subnet of a VPC — no public IP addresses are allowed. Because of that, they can’t directly access the internet unless a NAT gateway (or an equivalent EC2 instance) is setup to route the traffic on the used subnet. This additional setup is vital for functions that need to interact with the Internet and could create a scenario of unexpected errors for someone just starting with the platform.

Resource Access Permissions

Every Lambda function is associated with an IAM (Identity and Access Management) role, known as an execution role. Based on the role's associated policies, access is granted or denied to other AWS resources on your account. Your function should have, at least, the ability to access Amazon CloudWatch Logs for the purpose of streaming logs. Also, if you need your function to run on a VPC and access network resources, this must also be included in the role’s policies.

It's a good idea to start with an appropriate AWS managed policy for the desired use case and include other permissions as needed. For basic execution, the managed policy is AWSLambdaBasicExecutionRole, this allows the function to publish logs go CloudWatch. If network access is needed, then the starting point should be AWSLambdaVPCAccessExecutionRole. Other managed policies can be found in Lambda's official docs.

Development Approaches

In developing for Lambda, the simpler, the better. While this might be somewhat of a generic approach, that, in a sense, applies to any project, this is particularly true for this platform. The thing is that for a long-running server, you can often get away with a lot of complexity pushed towards the initialization time. After all, what are a few seconds of initialization time for a server that's meant to run for days, weeks or even months without a restart? And even if you're doing some CI/CD stuff where you software do get redeployed often, a bunch of strategies can be used so that users never notice the period where the new server is starting up.

On Lambda, on the other hand, you do need to count on your code starting up on user's requests. It's obviously not what we want, and likely not what's going to happen on every request; but if your user clicks on something and it takes five or ten seconds for the function to initialize, that will be noticed — and can even cause the impression of the application having hang. Because of that, Lambda functions are specially sensitive to unnecessary complexity and dependencies. If you load a heavy library but use only a tiny bit of it, you'll still be paying the cost of loading the library on at least some requests. So be mindful of that when adding a new dependency.

On the other hand, there's often a trade off between computing complexity and development complexity — specially in interpreted languages. For example: you're implementing a REST API end point and the user sends some parameter via a query argument; the simplest possible approach in terms of computing complexity would be to pick up that argument directly from the event passed to the lambda function and have it custom validated to your purposes. But if this parameter is, for example, a credit card number, now you need to implement your own credit card validator. And if you need to handle more arguments of more types, development complexity might grow uncontrollably — so the natural path there would be to use some library that handles that for you. But depending on how complete the library you chose is, and how much you'd like to outsource to libraries, you might end up loading a bunch of unused code and repeatedly paying for the memory, latency and CPU associated with it.

Because of that, it's important to understand that there's no one-size-fits-all solution when working with lambda functions. Every task has its own set of constraints and you should be mindful of them — sometimes optimizing for development speed; others for execution overhead; others for a balance between them.

Conclusion

As we’ve explored in this introduction to AWS Lambda, developers embarking on the journey of serverless architecture are presented with a plethora of considerations that can significantly impact the cost-effectiveness, performance, and scalability of their applications.

With its fine-grained pricing model, it encourages innovation and experimentation, allowing you to tailor your applications with precision to your use cases and to optimize for cost. Yet with flexibility comes the responsibility to understand the constraints and behaviors of this environment. It demands a thoughtful approach to managing resources, permissions, and application architecture. The serverless paradigm calls for applications designed for statelessness, event responsiveness, and autonomy, challenging traditional development models and encouraging inventive solutions.

In summary, AWS Lambda offers a powerful and versatile platform for serverless computing, but it demands a thoughtful approach to application development. Understanding its operational characteristics and limitations for building and deploying applications is key to leveraging its full potential. Whether for simple or complex tasks, AWS Lambda can be a scalable and efficient solution, provided we have its intricacies in mind.

AWS Lambda Overview

Introduction

Understanding AWS Lambda

Cost

Running Environment

Lifecycle

Network Constraints

Resource Access Permissions

Development Approaches

Conclusion

Comments

More from this blog

Reading through the US-EAST-1 Service Disruption Summary Report

It's Time to Write Tests

A path for improving LLM coding tools

AWS Lambda Cold Starts: Real-World Cost Optimization

Making Error Paths Visible: Learning from Rust's Type System

Command Palette

Introduction

Understanding AWS Lambda

Cost

Running Environment

Lifecycle

Network Constraints

Resource Access Permissions

Development Approaches

Conclusion

Comments

More from this blog