AWS Lambda And The Flying Dutchman -Part 1

13 min readFeb 3, 2025

Intro — Cursed Relic or Treasure Trove?

Pirates of the Caribbean is a legendary movie, and in many ways, so is AWS Lambda — both feared and revered. If Davy Jones’s locker takes the semblance of an infinite loop of immense potential and peril, lambda is its technological counterpart. It offers boundless computing resources but demands a strong hand at the helm. For engineers who understand and master it, Lambda provides a frictionless serverless experience. And for those who do not it becomes an unpredictable tool, which in most cases leads to runaway costs, cold starts, and operational pitfalls.

Much like Davy Jones’s locker, AWS lambda is not inherently good or bad — it is what developers make of it: what is the intent or purpose? Why use it? As with technology, everything is a tool, and if wielded wisely, Lambda allows cloud architects and engineers to deploy infinitely scalable workloads with minimal management. Lambda can quickly become unpredictable, leading to inefficiencies and unintended complexity if the intent is not clear.

The diagram below is a high-level overview of the AWS Lambda execution flow:

Suppose you are a fan of the Flying Dutchman — an immortal ship governed by an undying crew that operates tirelessly beneath the waves. In that case, you will notice that AWS Lambda follows the same principle — high availability, auto-scaling, and self-sustaining computing power. Meanwhile, the Dutchman’s crew — Koleniko, Jimmy Legs, Hagras, etc. — maintain the ship’s operations without rest. Lambda performs the same continuous infrastructure management and event-driven architecture without developer intervention.

Some similarities between The Flying Dutchman and AWS Lambda:

The beauty of all this is the emergence of serverless computing, which removes much of the burden of provisioning, configuring, updating, and scaling servers. As for developers who prefer building logic-centric applications, AWS Lambda makes this a reality.

Mechanics — Event-Driven, Ephemeral and Auto-Scaling

Ephemerality

Lambda functions are stateless and short-lived. This definition means that they run code on-demand (much like pressing an elevator’s button) and terminate after processing. Each invocation runs in an isolated execution environment that starts, processes the request received from an event, and shuts down. Unlike EC2 instances or Kubernetes pods, Lambda does not maintain a persistent runtime between invocations. Every time a function runs, it begins from a fresh execution context, and data or state stored within this environment during a previous invocation will be lost. When an event triggers a function, Lambda designates compute resources, initializes the function, executes the code, and then discards the environment. But if another request arrives, Lambda either processes the request with it (i.e., if a pre-initialized environment instance is available) or creates a new execution environment instance to process the request.

For example, you build a function that processes images uploaded to an S3 bucket. When a user uploads a new image, S3 triggers your Lambda function. Lambda spins up a temporary execution environment, runs your image-processing code, and then destroys the environment after completion. If you upload an image moments later, Lambda repeats the process, and the new execution will not have access to any temporary files or memory from the previous invocation. This ephemeral nature makes Lambda ideal for short-lived stateless workloads. But it also introduces some challenges.

If a function needs to persist data, it must do so externally in an S3 bucket, DynamoDB, or a different data store. Also, code that requires a long execution time (e.g., batch processing) may hit the 15-minute execution limit, requiring an orchestration service like ‘Step Functions’ to manage long-running workflows.

Event-driven Execution

AWS Lambda executes only in response to an event — this means that Lambda functions react to changes to data, API requests (say, from an API Gateway), messages, or system notifications. This event-driven nature is useful because, unlike trado-applications that continuously poll for new data, Lambda remains idle until an event triggers it — eliminating the need for constant resource allocation and thus reducing costs.

AWS services that often trigger Lambda functions:

API Gateway: When an HTTP request is made to an application, API Gateway invokes a Lambda function to process the request.
S3: When a file is uploaded, modified, or deleted in an S3 bucket, an event notification from this can trigger a Lambda function.
DynamoDB Streams: When items are inserted, updated, or deleted a stream of events can trigger a Lambda function for real-time processing.
EventBridge: Lambda can also react to system events, scheduled tasks, or third-party service events via EventBridge.
Step Functions: Lambda can also be orchestrated as part of a complex workflow, enabling long-running multi-step processes.

For instance, let us say Company X runs an e-commerce application and has a requirement to process orders automatically when a new record is added to the orders table in DynamoDB.

This is a snapshot of what the event-driven workflow looks like:

A customer orders through the website and order details are stored in DynamoDB.
DynamoDB triggers a stream of events, which invokes a Lambda function.
The Lambda function validates the order, calculates the total price, and updates the order status.
The system then publishes an event to EventBridge, which triggers another lambda function to send an order confirmation email.

Auto-Scaling Model

Lambda dynamically spins up execution environments based on demand–scaling automatically without manual intervention. When a function is triggered, Lambda creates an execution environment to process the request. If this request arrives simultaneously, Lambda creates additional instances to handle them concurrently. This scaling process ensures that applications remain responsive even under sudden spikes in traffic.

Imagine a news website where users upload videos, where the Lambda function transcodes videos uploaded to an S3 bucket. Let’s assume there might be 50 uploads per hour daily, but during a major event, uploads might spike to 3000 uploads per hour. Lambda automatically scales up to handle the increased workload, and when traffic decreases, it scales back down to zero.

However, Lambda has concurrency limits that define how many functions can run in parallel. By default, the limit is 1000 concurrent executions per AWS region. Additionally, if an event source produces data faster than Lambda can process, the function may throttle excess requests. That is, some requests will be temporarily blocked or delayed. To mitigate this, services like SQS (Simple Queue Service) can buffer events before passing them to Lambda–the Lambda function can then process events from the queue at a more manageable rate, preventing overload and ensuring that all events are eventually processed.

Users interact with the website to upload videos.
Triggers the first Lambda function when a video upload occurs.
Receives upload event and places a message on SQS queue. The message contains information about the uploaded video.
Acts as a buffer. Decouples upload process from the actual transcoding. This prevents potential throttling issues if multiple uploads occur.
Triggered by events from SQS queue and gets upload info; transcode and stores video in S3 bucket.
Stores the transcoded videos for later use (e.g., streaming, playback)

Scale — Architecting Serverless Systems With AWS Lambda

Large-Scale Serverless Systems

Using the analogy of the Flying Dutchman, every part of the ship and its crew work together to achieve a common goal. Similarly, AWS Lambda does not operate in isolation; it facilitates a decoupled, event-driven approach, ensuring high availability.

In large-scale systems, Lambda functions are commonly utilized to:

Process async and event-driven workloads: Lambda handles real-time event streams, file uploads, database triggers, and message queues.
Build API-driven serverless applications: Lambda can act as a backend for RESTful APIs, and also as an event-based microservice (this means that Lambda can run code in response to services like API Gateway, S3, or SQS, enabling microservices that scale automatically without managing servers).
Orchestrate distributed workflows: Lambda can be used here to coordinate multiple services using Step Functions, ensuring reliability and stateful execution.
Enable real-time data processing: Lambda can consume AWS Kinesis data stream events, DynamoDB Streams, or SQS for near real-time analysis.

Let’s take a look at some key architecture patterns and examine how Lambda operates within large-scale systems:

Event-driven Microservices

Lambda can act as the compute layer in an event-driven architecture, triggered by data events, API requests, or message queues:

Case Study: S3 Event Processing
Scenario: A file upload triggers an AWS Lambda function to process the data and store metadata in a DynamoDB table.
Workflow: A user uploads an image to an S3 bucket; the S3 event triggers a Lambda function that: extracts metadata (e.g., file type, size, timestamp); and then stores the metadata in DynamoDB. The Lambda function invokes an SNS or EventBridge to notify downstream services.

Event-driven microservices — Fig 6. Event-driven microservice

Optimization Consideration: This workflow can be optimized by enabling concurrency controls to prevent Lambda from being overwhelmed by high-volume uploads. In most cases, ensure that batch process events are used to optimize performance and reduce execution costs.

API-Driven Serverless Backend

You can use Lambda to power serverless APIs by integrating with API Gateway:

Case Study: RESTful API Backend
Scenario: A REST API for an e-commerce application requires product catalog retrieval via Lambda.
Workflow: API Gateway receives a GET request (/products); API Gateway invokes an AWS Lambda function which: fetches product data from DynamoDB; marshals the response in JSON and returns it to API Gateway. The API Gateway responds to the client with the retrieved product data.

Sample: API-Driven Backend — Fig 7. Simple API-Driven Backend

Optimization: Cloudfront (CDN) can be implemented alongside API Gateway caching to reduce Lambda function invocations. Another option is to introduce fine-grained IAM policies to restrict unauthorized API access — allowing only trusted access to make requests to the Lambda function.

Performance Tuning for AWS Lambda

Lambda’s serverless nature offers scalability and cost-effectiveness, but performance tuning is crucial for optional responsiveness. Cold starts are a significant factor to consider.

Cold Start Optimization

Cold starts occur when a Lambda function is invoked for the first time or after a period of inactivity. A new environment needs to be initialized, this involves: downloading the necessary function code, and Lambda dependencies, and setting up the runtime. This process adds latency, impacting response times, especially for latency-sensitive applications like APIs or real-time data processing.

Some Strategies to Mitigate Cold Starts

Provisioned Concurrency: Unlike reserved concurrency, you can specify the number of instances to be kept warm or alive and ready to execute. So when a request comes in, it’s immediately handled by a warm instance, bypassing the cold start process. This flow is ideal for applications with predictive traffic patterns and strict latency requirements. However, using provisioned concurrency incurs more cost. It is smart to balance performance needs with cost considerations. Developers can also use auto-scaling to adjust provisioned concurrency based on demand.
Keeping Functions Warm: This approach uses a Lambda function to keep instances active. You can schedule invocations with EventBridge to trigger the Lambda function at regular intervals, preventing it from going idle. Pay close attention to the invocation frequency to avoid incurring unnecessary costs.
Optimized Function Packaging: Reducing the size of your deployment package is essential for managing cold start duration. A smaller package leads to faster download and initialization times. To achieve this, include only the necessary libraries and dependencies and choose a runtime that aligns with your function’s requirements. Compiled languages like Go and Rust generally have quicker cold starts than interpreted languages like Node.js and Python.

Automation: Deploying Lambda With Terraform and AWS CDK

IaC tools like Terraform and AWS CDK can ensure repeatable, scalable, and automated deployment.

Defining Lambda configuration with Terraform (AWS):

# Define the IAM role that the Lambda function will assume.
resource "aws_iam_role" "lambda_role" {
  name = "lambda-test-role"

  # The defined policy allows the Lambda service to assume this role.
  # Lambda needs this role so it can make use of the function
  assumed_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = ["sts:AssumeRole"]
        Principal = {
          Service = ["lambda.amazonaws.com"]
        }
      }
    ]
  })
}

# Lambda function definition.
resource "aws_lambda_function" "lambda_function" {
  function_name = "lambda-test-function"

  # Making use of a custom runtime (Amazon Linux 2).  
  # Lambda requires a custom bootstrap (for Golang)
  # executable in the deployment package.
  runtime = "provided.al2"

  # The handler for Golang custom runtime is "bootstrap".  
  # Which is responsible for starting the application code.
  handler = "bootstrap"

  # Set default timeout for the Lambda function of 15s.
  # But it might need to be longer for certain workloads.
  timeout = "15"

  # Memory is also allocated for the Lambda function. 
  # Memory allocation also affects CPU power proportionally. 
  # 1024MB is also a moderate amount, suitable for data transformation, 
  # API processing, and moderate compute workloads.
  memory_size = "1024"

  # The role is tied with the Lambda function. Granting 
  # the function the permissions defined in the IAM role.
  role = aws_iam_role.lambda_role.arn

  # Filename can either be defined with a relative or absolute path;
  # and must also include the bootstrap executable for the custom runtime.
  filename = "../backend/lambda/lamba_payload.zip"

  # Environment variables can also be added to the Lambda function.
  # This is useful if programming logic with AWS Lambda and also a 
  # good practice for storing configuration values or secrets.
  # If there isn't a need for environtment variables, then remove it!
  environment {
    variables = {
      DEFAULT_REGION = var.default_region
    }
  }

  # For most config, tracing with AWS X-Ray is required, this
  # can be added here with an 'active' or 'passthrough' mode
  tracing_config {
    mode = "Active" # Or "PassThrough"
  }
}

This Terraform configuration sets up an AWS Lambda function using a custom Amazon Linux 2 runtime (provided.al2), which requires a Go bootstrap executable.
It creates an IAM role lambda_role that allows Lambda to assume it via sts:AssumeRole. The Lambda function has a 15-second timeout and 1024 MB memory allocation, affecting CPU performance, along with an execution role for necessary permissions.
The deployment package lambda_payload.zip is sourced from a specified directory, and environment variables are configured for dynamic settings. AWS X-Ray tracing is enabled in Active mode for debugging and performance monitoring.

Defining a Lambda function with AWS CDK (using Golang):

package main

import (
        "github.com/aws/aws-cdk-go/awscdk/v2"
        "github.com/aws/aws-cdk-go/awscdk/v2/awslambda"
        "github.com/aws/constructs-go/constructs/v10"
        "github.com/aws/jsii-runtime-go"
)

// stackProps defines the properties for the CDK stack.  
// It embeds the awscdk.StackProps, allowing you to 
// customize stack-level settings.
type stackProps struct {
        awscdk.StackProps
}

// newStack creates a new CDK stack.
func newStack(scope constructs.Construct, id string, props *stackProps) awscdk.Stack {
        // Here, we create a new CDK stack with the given 
        // scope, ID, and properties.
        stack := awscdk.NewStack(scope, &id, &props.StackProps)

        // The Lambda function is defined within the stack here.
        awslambda.NewFunction(stack, jsii.String("TestGoLambda"), &awslambda.FunctionProps{
                // Specify the runtime environment for the Lambda function. 
                // This tells AWS to use the GO_1_x runtime.
                Runtime: awslambda.Runtime_GO_1_X(),

                // The handler for Go binaries is typically "bootstrap". 
                // This refers to the bootstrap executable that Lambda uses 
                // to start the Go application.
                Handler: jsii.String("bootstrap"),

                // Specify the location of the Lambda function's code.  
                // This tells the CDK to package the code from the 
                // "lambda-handler" directory and deploy it
                // with the Lambda function.
                Code: awslambda.Code_FromAsset(jsii.String("lambda-handler"), nil),

                // Optionally set the name of the Lambda function.  
                // If the name is not provided, the CDK
                // will generate a unique name.
                FunctionName: jsii.String("TestGoLambdaCDK"),
        })

        return stack
}

This AWS CDK (Go) code defines a CloudFormation stack that provisions an AWS Lambda function using the Go 1.x runtime (GO_1_X). The newStack function initializes a CDK stack and deploys a Lambda function named TestGoLambda. The function uses “bootstrap” as the handler (required for Go Lambda functions) and packages its code from the “lambda-handler” directory. This setup enables the deployment and execution of a Go-based Lambda function on AWS.

And calling the NewStack function:

func main() {
        // Create a new CDK application.
        app := awscdk.NewApp(nil)

        // Create a new stack within the application.  
        // "TestLambdaFunction" is the ID of the stack.
        // The stackProps are used to configure the stack, 
        // including the AWS account and region.
        newStack(app, "TestGoLambdaStack", &stackProps{
                StackProps: awscdk.StackProps{
                        // The CDK needs the env variables to deploy 
                        // your stack to the correct AWS environment.
                        Env: &awscdk.Environment{
                                Account: jsii.String(os.Getenv("AWS_ACC_ID")),
                                Region:  jsii.String(os.Getenv("AWS_REGION")),
                        },
                },
        })

        // Synthesize the CloudFormation template.  
        // This generates the CloudFormation
        // template that defines the infrastructure for your stack.  
        // You then use the CDK CLI to deploy this template to AWS.
        app.Synth(nil)
}

This Go CDK application initializes an AWS CDK app and creates a CloudFormation stack named TestGoLambdaStack. It retrieves the AWS account ID and region from environment variables (AWS_ACC_ID, AWS_REGION) to ensure deployment to the correct AWS environment. The stack is configured with these settings and then added to the application. Finally, app.Synth(nil) generates the CloudFormation template, which can be deployed using the AWS CDK CLI.

Conclusion

Although AWS Lambda offers immense potential for serverless computing, it also presents a double-edged sword. Its event-driven, ephemeral, and auto-scaling nature allows for the creation of highly scalable and cost-effective applications, mirroring the tireless operation of the Flying Dutchman. However, mastering Lambda requires careful consideration of its mechanics, including cold starts, concurrency limits, and the need for external data persistence. By understanding these nuances and implementing best practices like optimized function packaging, provisioned concurrency, and infrastructure-as-code with tools like Terraform and AWS CDK, developers can harness Lambda’s power to build robust, performant, and maintainable serverless systems, avoiding the pitfalls of runaway costs and operational complexity.

Part 2 of this series will start with building Lambda services with AWS CDK and Terraform.