AWS unit testing with LocalStack, Docker and Java

Traditionally unit testing is performed with a class being injected with mocks for its dependencies, so testing is focused just on the behaviours of the class under consideration.

Figure 1: Unit Test class responsibilities

While this is effective for “simple” dependent APIs that may only have a few behaviours, for complex resources such as databases, web service APIs such as DynamoDB, etc it can make sense to unit test using a “fast” implementation of the real resource. By “fast” here we mean quick to setup and tear down so that we can concentrate our effort on the behaviours of the class being tested.

Modern development is tied closely to cloud native APIs such as AWS. We can use a “fast” stub of AWS service with LocalStack deployed on docker. This gives us in memory, localhost based AWS services for most of the available APIs.

Figure 2: LocalStack Unit Test for AWS resources

How to do this using Java and Maven

Using our oss-maven-standards build system we have enabled optional Docker style unit testing using Surefire under Maven. An example layout with docker compose configuration, etc can be seen in the java-lambda-poc module.

Use the any of our parent POMs as your maven archetype.

Set your pom.xml’s parent to one of our archetypes to get docker support. For example, when building a java lambda:

<parent>
    <groupId>com.limemojito.oss.standards</groupId>
    <artifactId>java-lambda-development</artifactId>
    <version>15.2.7</version>
    <relativePath/>
</parent>

Enable docker for unit test mode

This is done in the properties section of the pom.xml

<properties>
   ...
   <!-- Test docker unit test... -->
   <docker.unit.test>true</docker.unit.test>
   ...
</properties>

For Spring Boot testing, set active profile to “integration-test”

We are using our S3Support test utilities to build a set of S3 resources around our unit test. These automatically configure LocalStack when the configuration is imported as below.

@ActiveProfiles("integration-test")
@SpringBootTest(classes = S3SupportConfig.class)
public class S3DockerUnitTest {

    @Autowired
    private S3Support s3;

Write your unit test

Now we can write a unit test that is backed by LocalStack’s S3 implementation in docker when the test runs:

    @Test
    public void shouldDoThingsWithS3AsAUnitTest() {
        s3.putData(s3Uri, "text/plain", "hello world".getBytes(UTF_8));
        assertThat(s3.keyExists(s3Uri)).withFailMessage("Key %s is missing", s3Uri)
                                       .isTrue();
    }

Full Source Example

https://github.com/LimeMojito/oss-maven-standards/tree/master/development-test/jar-lambda-poc

April 26, 2025May 22, 2025

Surprise: AWS SnapStart needs a new image

When using AWS SnapStart to optimise our Java Lambdas, we’ve noticed an interesting caveat:

If the lambda is not invoked for a long period of time (say a week) then the snapshot image is discarded. The next invocation will generate a new image.

While this is not an issue for a lambda endpoint with some volume, for low volume lambdas, such as site where New User Onboarding may be rare, this means that the user experience may be poor as there could be a two minute delay on the invocation! In our situation we had a five second timeout on the call so this breaks immediately.

How do we work around this?

Keep the lambda version hot with pre-provisioning of 1 (see Pre-provisioning concurrency). This has an AWS cost based on your lambda memory settings.
“Nudge” the lambda by invoking it once a day on a timer. This has an AWS cost but only one invocation.
“Tie” the lambda image to another behaviour with higher volume by using lambda based routing. As the higher volume invokes the image more often, snapshot staleness doesn’t occur.
Replace the Java lambda with Javascript / python etc that has a lower cold start time.

Keeping a lambda SnapStart image hot with pre-provisioning

Adjust your deployment to set pre-provisioned concurrency to at least 1. Be aware that you will be charged for the lambda execution as if the lambda was running for the provisioned time.

Consider an ARM (cheaper) 1GB lambda provisioned for one day in us-west-2 (Oregon)

$0.0000033334 for every GB-second x 60 x 60 x 24
= 0.0000033334 x 86400
= USD $0.288 per day
= USD $105.12 per year

Plus execution time costings for actual invocation.

Keeping a lambda SnapStart image hot with a timer

Adjust your deployment by creating a CloudWatch event to invoke your lambda once a day. This tutorial, while focusing on Javascript, is applicable for the CloudWatch setup to invoke the Java lambda.

Note the response can be ignored, we are simply invoking so that the image remains hot. An example AWS cost of for an ARM (cheaper) 1GB lambda provisioned in us-west-2 (Oregon) with a 250ms execution time:
$0.0000000133 for every GB ms x 250
= USD $0.0000033250 per day
= USD $0.00123 per year

This may also be within the “free tier” for lambda invocations depending on your site traffic.

For an example using Java CDK: See our OSS example here.

“Tying” Lambda images together

In our scenario, we have a User Group Calculation lambda that is called at the session start for all logged in users that has a similar library and construction to the New User Onboarding. Given the volume of the User Group Calculation the image never becomes stale.

We adjust our deployment configuration so that the entry points for the User Group Calculation and the New User Onboarding point to the same Lambda image. That lambda implementation switches between the two functions based on the request event structure.

At the cost of moving routing into the implementation, we have tied the high volume and low volume calls so that the share image never becomes stale.

Replacing Java SnapStart lambda implementation

Another option is to replace the low volume Java lambda with interpreted code that will not suffer from the SnapStart image staleness. A Javascript lambda would be lighter weight, and if the lambda code is not too complex it could be crafted without middle wares to speed lambda time.

However this introduces more of a polyglot language approach, which we wanted to avoid as we have a lot of in house libraries that speed our Java development.

Conclusion

For our start-up software we decided to use the day timer as the AWS cost was trivial and we could apply a standard approach into our CDK module for lambda deployment.

Beware low volume Java lambdas and SnapStart.

November 9, 2024November 9, 2024

Optimising AWS SnapStart and Spring Boot Java Lambdas

This article looks at optimising a Java Spring Boot application (Cloud Function style) with AWS SnapStart, and covered advanced optimisation with lifecycle management of pre snapshots and post restore of the application image by AWS SnapStart. We cover optimising a lambda for persistent network connection style conversational resources, such as an RDBMS, SQL, legacy messaging framework, etc.

How Snap Start Works

To import start up times for a cold start, SnapStart snapshots a virtual machine and uses the restore of the snapshot rather than the whole JVM + library startup time. For Java applications built on frameworks such as Spring Boot, this provides order of magnitude time reductions on cold start time. For a comparison with raw, SnapStart and Graal Native performance see our article here.

What frameworks do we use with Spring Boot?

For our Java Lambdas we use Spring Cloud Function with the AWS Lambda Adaptor. For an example for how we set this up, and links to our development frameworks and code, see our article AWS SnapStart for Faster Java Lambdas

Default SnapStart: Simple Optimisation of the Lambda INIT phase

When the lambda version is published SnapStart will run up the Java application to the point that the lambda is initialised. For a spring cloud function application, this will complete the Spring Boot lifecycle to the Container Started phase. In short, all your beans will be constructed, injected and started from a Spring Container perspective.

SnapStart will then snapshot the virtual machine with all the loaded information. When the image is restored, the exact memory layout of all classes and data in the JVM is restored. Thus any data loaded in this phase as part of a Spring Bean Constructor, @PostCreate annotated methods and ContextRefresh event handlers will have been reloaded as part of the restore.

Issues with persistent network connections

Where this breaks down is if you wish to use a “persistent” network connection style resource, such as a RDBMS connection. In this example, usually in a Spring Boot application a Data Source is configured and the network connections initialised pre container start. This can cause significant slow downs when restoring an image, perhaps weeks after its creation, as all the network connections will be broken.

For a self healing data source, when a connection is requested the connection will check, timeout and have to reconnect the connection and potentially start a new transaction for the number of configured connections in the pool. Even if you smartly set the pool size to one, given the single threaded lambda execution model, that connection timeout and reconnect may take significant time depending on network and database settings.

Advanced Java SnapStart: CRaC Lifecycle Management

Project CRaC, Co-ordinated Restore at Checkpoint, is a JVM project that allows responses to the host operating system having a checkpoint pre a snapshot operation, and the signal that a operating system restore has occurred. The AWS Java Runtime supports integration with CRaC so that you can optimise your cold starts even under SnapStart.

At the time of our integration, we used the CRaC library to create a base class that could be used to create a support class that can handle “manual” tailoring of preSnapshot and postRestore events. Newer versions of boot are integrating CRaC support – see here for details.

We have created a base class, SnapStartOptimizer, that can be used to create a spring bean that can respond to preSnapshot and postRestore events. This gives us two hooks into the lifecycle:

Load more data into memory before the snapshot occurs.
Restore data and connections after we are running again.

Optimising pre snapshot

In this example we have a simple Spring Component that we use to exercise some functionality (http based) to load and lazy classes, data, etc. We also exercise the lookup of our spring cloud function definition bean.

@Component
@RequiredArgsConstructor
public class SnapStartOptimisation extends SnapStartOptimizer {

    private final UserManager userManager;
    private final TradingAccountManager accountManager;
    private final TransactionManager transactionManager;

    @Override
    protected void performBeforeCheckpoint() {
        swallowError(() -> userManager.fetchUser("thisisnotatoken"));
        swallowError(() -> accountManager.accountsFor(new TradingUser("bob", "sub")));
        final int previous = 30;
        final int pageSize = 10;
        swallowError(() -> transactionManager.query("435345345",
                                                    Instant.now().minusSeconds(previous),
                                                    Instant.now(),
                                                    PaginatedRequest.of(pageSize)));
        checkSpringCloudFunctionDefinitionBean();
    }
}

Optimising post restore – LambdaSqlConnection class.

In this example we highlight our LambdaSqlConnection class, which is already optimised for SnapStart. This class exercises a delegated java.sql.Connection instance preSnapshot to confirm connectivity, but replaces the connection on postRestore. This class is used to implement a bean of type java.sql.Connection, allowing you to write raw JDBC in lambdas using a single RDBMS connection for the lambda instance.

Note: Do not use default Spring Boot JDBC templates, JPA, Hibernate, etc in lambdas. The overhead of the default multi connection pools, etc is inappropriate for lambda use. For heavy batch processing a “Run Task” ECS image is more appropriate, and does not have 15 minute timeout constraints.

So how does it work?

Instances and interfaces managed by LambdaSqlConnection

The LambdaSqlConnection class manages the Connection bean instance.
When preSnapshot occurs, LambdaSqlConnection closes the Connection instance.
When postRestore occurs, LambdaSqlConnection reconnects the Connection instance.

Because LambdaSqlConnection creating a dynamic proxy as the Connection instance, it can manage the delegated connection “behind” the proxy without your injected Connection instance changing.

Using Our SQL Connection replacement in Spring Boot

See the code at https://github.com/LimeMojito/oss-maven-standards/tree/master/utilities/aws-utilities/lambda-sql.

Maven dependency:

<dependency>
   <groupId>com.limemojito.oss.standards.aws</groupId>
   <artifactId>lambda-sql</artifactId>
   <version>15.0.2</version>
</dependency>

Importing our java.sql.Connection interceptor

@Import(LambdaSqlConnection.class)
@SpringBootApplication
public class MySpringBootApplication {

You can now remove any code that is creating a java.sql.Connection and simply use a standard java.sql.Connection instance injected as a dependency in your code. This configuration creates a java.sql.Connection compatible bean that is optimised with SnapStart and delegates to a real SQL connection.

Configuring your (real) DB connection

Example with Postgres driver.

lime:
  jdbc:
    driver:
      classname: org.postgresql.Driver
    url: 'jdbc:postgresql://localhost:5432/postgres'
    username: postgres
    password: postgres

Example spring bean using SQL

@Service
@RequiredArgsConstructor
public class MyService {
    private final Connection connection;

    @SneakyThrows
    public int fetchCount() {
      try(Statement statement = connection.createStatement()){
         try(ResultSet results = statement.executeQuery("count(1) from some_table")) {
             results.next();
             results.getInt(1);
         }
      }
    }
}

References

AWS SnapStart: https://docs.aws.amazon.com/lambda/latest/dg/snapstart.html
Cold Start Timings: https://limemojito.com/native-java-aws-lambda-with-graal-vm/
Using AWS Snapstart for Faster Java Lambdas: https://limemojito.com/aws-snap-start-for-faster-java-lambda/
Project CRaC: https://github.com/CRaC/docs
Spring Framework CRaC integration: https://docs.spring.io/spring-framework/reference/integration/checkpoint-restore.html

October 14, 2024October 14, 2024

Deploying Java Lambda with Localstack

We deploy and debug our Java Lambda on development machines using Localstack to emulate and Amazon Web Services (AWS) account. This article walks through the architecture, deployment using our open source java framework to local stack and enabling a debug mode for remote debugging using any Java integrated development environment (IDE).

These capabilities live in our test-utilities module, LambdaSupport.java.

Localstack development architecture

Our build framework uses Docker to deploy a Localstack image, then we use AWS Api calls to deploy a zip of our lambda java classes to the Localstack lambda engine. Due to the size of the zip files, we need to deploy the lambda using a S3 url. We use Localstack’s S3 implementation to emulate the process.

When the lambda is deployed, the Localstack Lambda engine will pull the AWS Lambda Runtime image from public ECR and then perform the deployment steps. Using the Localstack endpoint for lambda we now have a full environment where we can perform a lambda.invoke to test the deployed function.

Figure 1: Development architecture using Localstack for lambda deployment

Viewing lambda logs

With the appropriate Localstack configuration we can view lambda logs for both startup and run of the lambda. Note these logs appear in the docker logs for the AWS Lambda Runtime Container. This container spins up when the lambda is deployed.

The easiest method we use to see the logs is to:

Run the Junit test in debug, with a breakpoint after the lambda invoke.
When the breakpoint is hit, use docker ps and docker logs to see the output of the Lambda Runtime.
In IntelliJ Ultimate, you can see the containers deployed via the Services pane after connecting to your docker daemon.

Using the architecture in debug mode

We can use this architecture to remote debug the deployed lambda. Our LambdaSupport class includes configuration on deploy to enable debug mode as per the Localstack documentation https://docs.localstack.cloud/user-guide/lambda-tools/debugging/. With our support class you simply switch from java() to javaDebug() and the deploy will configure the runtime for debug mode (port 5050 by default).

In your docker-compose.yml, set the environment variable LAMBDA_DOCKER_FLAGS=-p 127.0.0.1:5050:5050 -e LS_LOG=debug.

This enables port passthrough for the java debugger from localhost to port 5050 of the container (assuming that is where the JVM debugging is configured for).

Do not commit this code as it will BLOCK test threads until a debugger is connected (port 5050 by default).

Figure 2: Localstack Java Lambda debug architecture

References:

Lambda ECR images: https://gallery.ecr.aws/lambda/provided
Lambda ECR image source code: https://github.com/aws/aws-lambda-base-images
Localstack docker image: https://hub.docker.com/r/localstack/localstack
Localstack documentation: https://docs.localstack.cloud/overview/
Localstack Remote Lambda Debugging: https://docs.localstack.cloud/user-guide/lambda-tools/debugging/

Code examples

See https://github.com/LimeMojito/oss-maven-standards/blob/master/development-test/jar-lambda-poc/src/test/java/ApplicationIT.java for a full example.

Adding test-utilities to your maven project

These are included by default if you use our jar-lambda-development parent POM.

See our post about using our build system for maven.

Otherwise you can manually add the support as below (version omitted),

<dependency>
    <groupId>com.limemojito.oss.test</groupId>
    <artifactId>test-utilities</artifactId>
    <scope>test</scope>
</dependency>
<dependency>
    <!-- Access for LambdaSupport -->
    <groupId>software.amazon.awssdk</groupId>
    <artifactId>lambda</artifactId>
    <scope>test</scope>
</dependency>
<dependency>
    <!-- Access for LambdaSupport -->
    <groupId>software.amazon.awssdk</groupId>
    <artifactId>s3</artifactId>
    <scope>test</scope>
</dependency>

Loading the lambda as a static variable in a unit test.

We recommend a static initialised once a junit setup function due to the time to deploy the lambda.

The LambdaSupport.java method performs deployment of the supplied module zip to Localstack S3, then invokes the AWS Lambda API to confirm that the lambda has started cleanly (state == Active).

private static Lambda LAMBDA;
...
// environment variables for the lambda configuration
final Map<String, String> environment = Map.of(
                    "SPRING_PROFILES_ACTIVE", "integration-test"
                    "SPRING_CLOUD_FUNCTION_DEFINITION","get"
            );
// using the lambda zip that was built in module ../jar-lambda-poc
LAMBDA = lambdaSupport.java("../jar-lambda-poc",
                            LimeAwsLambdaConfiguration.LAMBDA_HANDLER,
                            environment);

Invoking the lambda for black box testing

This example is using a static variable for the Lambda, JUnit 5 and assert4J. An AWS API Gateway event JSON is loaded and invoked to the deployed lambda. The result is asserted.

Full example is in our oss-maven-standards repository as in integration test (IT, run by failsafe).

@Test
public void shouldCallTransactionPostOkApiGatewayEvent() {
    final APIGatewayV2HTTPEvent event = json.loadLambdaEvent("/events/postApiEvent.json",
                                                             APIGatewayV2HTTPEvent.class);

    final APIGatewayV2HTTPResponse response = lambdaSupport.invokeLambdaEvent(LAMBDA,
                                                                              event,
                                                                              APIGatewayV2HTTPResponse.class);

    assertThat(response.getStatusCode()).isEqualTo(200);
    String output = json.parse(response.getBody(), String.class);
    assertThat(output).isEqualTo("world");
}

Localstack lambda deployment debug example

We alter the setup to use the deprecated javaDebug function. Do not commit this code as it will BLOCK test threads until a debugger is connected (port 5050 by default).

For a clean setup in Intelij that waits for the lambda to start in debug mode, see the excellent article on Localstack https://docs.localstack.cloud/user-guide/lambda-tools/debugging/ “Configuring IntelliJ IDEA for remote JVM debugging”.

// using the lambda zip that was built in module ../jar-lambda-poc
LAMBDA = lambdaSupport.javaDebug("../jar-lambda-poc",
                                 LimeAwsLambdaConfiguration.LAMBDA_HANDLER,
                                 environment);

February 17, 2024August 10, 2024

Maintainable builds – with Maven!

Maven is known to be a verbose, opinionated framework for building applications, primarily for a Java Stack. In this article we discuss Lime Mojito’s view on maven, and how we use it to produce maintainable, repeatable builds using modern features such as automated testing, AWS stubbing (LocalStack) and deployment. We have OSS standards you can use in your own maven builds at https://bitbucket.org/limemojito/oss-maven-standards/src/master/ and POM’s on maven central.

Before we look at our standards, we set the context of what drives our build design by looking at our technology choices. We’ll cover why our developer builds are setup this way, but not how our Agile Continuous Integration works in this post.

Lime Mojito’s Technology Choices

Lime Mojito uses a Java based technology stack with Spring, provisioned on AWS. We use AWS CDK (Java) for provisioning and our lone exception is for web based user interfaces (UI), where we use Typescript and React with Material UI and AWS Amplify.

Our build system is developer machine first focused, using Maven as the main build system for all components other than the UI.

Build Charter

The build enforces our development standards to reduce the code review load.
The build must have a simple developer interface – mvn clean install.
If the clean install passes – we can move to source Pull Request (PR).
- PR is important, as when a PR is merged we may automatically deploy to production.
Creating a new project or module must not require a lot of configuration (“xml hell”).
A module must not depend on another running Lime Mojito module for testing.
Any stub resources for testing must be a docker image.
- Postgres, Localstack, etc
Stubs will be managed by the build process for integration test phase.
- ie if you’re using stubs, you’re building a failsafe maven plugin IT test case.
The build will handle style and code metric checks (CheckStyle, Maven Enforcer, etc) so that we do not waste time in PR reviews.
For open source, we will post to Maven Central on a Release Build.

Open Source Standards For Our Maven Builds

Our very “top” level of build standards is open source and available for others to use or be inspired by:

Bitbucket: https://bitbucket.org/limemojito/oss-maven-standards/src/master/

The base POM files are also available on the Maven Central Repository if you want to use our approach in your own builds.

https://repo.maven.apache.org/maven2/com/limemojito/oss/standards/

Maven Example pom.xml for building a JAR library

This example will do all the below with only 6 lines of extra XML in your maven pom.xml file:

enforce your dependencies are a single java version
resolve dependencies via the Bill of Materials Library that we use too smooth out our Spring + Spring Boot + Spring Cloud + Spring Function + AWS SDK(s) dependency web.
Enable Lombok for easier java development with less boilerplate
Configure code signing
Configure maven repository deployment locations (I suggest overriding these for your own deployments!)
Configure CheckStyle for code style checking against our standards at http://standards.limemojito.com/oss-checkstyle.xml
Configure optional support for docker images loading before integration-test phase
Configure Project Lombok for Java Development with less boilerplate at compile time.
Configure logging support with SLF4J
Build a jar with completed MANIFEST.MF information including version numbers.
Build javadoc and source jars on a release build

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>my.dns.reversed.project</groupId>
    <artifactId>my-library</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <packaging>jar</packaging>

    <parent>
        <groupId>com.limemojito.oss.standards</groupId>
        <artifactId>jar-development</artifactId>
        <version>13.0.4</version>
        <relativePath/>
    </parent>
</project>

When you add dependencies, common ones that are in or resolved via our library pom.xml do not need version numbers as they are managed by our modern Bill of Materials (BOM) style dependency setup.

Example using the AWS SNS sdk as part of the jar:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>my.dns.reversed.project</groupId>
    <artifactId>my-library</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <packaging>jar</packaging>

    <parent>
        <groupId>com.limemojito.oss.standards</groupId>
        <artifactId>jar-development</artifactId>
        <version>13.0.4</version>
        <relativePath/>
    </parent>

    <dependencies>
        <dependency>
            <groupId>software.amazon.awssdk</groupId>
            <artifactId>sns</artifactId>
        </dependency>
    </dependencies>
</project>

Our Open Source Standards library supports the following module types (archetypes) out of the box:

Type	Description
java-development	Base POM used to configure deployment locations, checkstyle, enforcer, docker, plugin versions, profiles, etc. Designed to be extended for different archetypes (JAR, WAR, etc.).
jar-development	Build a jar file with test and docker support
jar-lamda-development	Build a Spring Boot Cloud Function jar suitable for lambda use (java 17 Runtime) with AWS dependencies added by default. Jar is shaded for simple upload.
spring-boot-development	Spring boot jar constructed with the base spring-boot-starter and lime mojito aws-utilities for local stack support.

Available Module Development Types

We hope that you might find these standards interesting to try out.

October 11, 2023October 11, 2023

CPU Throttling – Scale by restricting work

We have a web service responding to web requests. The service has a thread pool where each web request uses one operating system thread. The requests are then managed by a multi-core CPU that time-slices between the various threads using the operating system scheduler.

This example is very similar to how Tomcat (Spring Boot MVC) works out of the box when servicing requests with servlets in the Java web server space. The Java VM (v17) matches a Java Thread to an operating system thread that is then scheduled for execution by a core.

So what happens when we have a lot of requests?

Many threads here are sliced between the 4 cores. This slicing of threads where a core works on one for a while, then context switches to another thread, can scale to any level. However, there is an expense in CPU time to switch between one thread to another. This context switch is expensive as it involves both memory and CPU manipulation.

Given enough threads, the CPU cores can quickly spend a significant amount of time context switching when compared to the actual amount of time processing the request.

How do we reduce context switching?

We can trade off context switching for latency by blocking a request thread until a vCPU is available to do the work. Provided the work is largely CPU bound this may reduce the overall throughput time if the context switching has become a major use of the available vCPU resources.

For our Java spring boot based application we introduce one of the standard Executors to provide a blocking task service. We use a WorkStealingPool which is an executor that defaults the worker threads to the number of CPUs available with an unlimited queue depth.

We now move the CPU heavy process into a task that can be scheduled onto the executor by a given thread. The thread will then block on the Future returned from submitting the task – this blocking occurs until a worker thread has completed the task’s job and returned a result.

On our application, this returned a 5X improvement to average throughput times for the same work being submitted to a single microservice performing the request processing. This goes to show that in our situation the majority of CPU was being spent on context switching between requests rather than servicing the CPU intensive task for each request.

In our case this translated to 5X less CPU required and a similar reduction in our AWS EC2 costs for this service as we needed less instances provisioned to support the same load.

September 2, 2023September 2, 2023

AWS Snap Start for faster Java Lambda

After finding Native Java Lambda to be too fragile for runtimes we investigated AWS Snap Start to speed up our cold starts for Java Lambda. While not as fast as native, Snap Start is a supported AWS Runtime mode for Lambda and it is far easier to build and deploy compared to the requirements for native lambda.

How does Snap Start Work?

Snap Start runs up your Java lambda in the initialisation phase, then takes a VM snapshot. That snapshot becomes the starting point for a cold start when the lambda initialises, rather than the startup time of your java application.

With Spring Boot this shows a large decrease in cold start time as the JVM initialisation, reflection and general image setup is happening before the first request is sent.

Snap Start is configured by saving a Version of your lambda. This version phase takes the VM snapshot and loads that instead of the standard java runtime initialisation phase. The runtime required is the offical Amazon Lambda Runtime and no custom images are required.

What are the trade offs for Snap Start?

Version Publishing needs to be added to the lambda deployment. The deployment time is longer as that image needs to be taken when the version is published.

VM shared resources may behave differently to development as they are re-hydrated before use in the cold start case. For example DB connection pools will need to fail and reconnect as they be begin at request time in a disconnected state. However see AWS RDS Proxy for this serverless use case.

As at 26th August 2023 SnapStart is limited to the x86 Architecture for Lambda runtimes.

What are the speed differences?

After warm up there was no difference between a hot JVM and the native compiled hello world program. Cold start however showed a marked difference from memory settings of 512MB and higher due to the proportional allocation of more vCPU.

Times below are in milliseconds.

Architecture	256	512	1024
Java	5066	4054	3514
SnapStart	4689.22	2345.2	1713.82
Native	1002	773	670

Comparison of Architecture v Lambda Memory Configuration

At 1GB with have approximately 1 vCPU for the lambda runtime which makes a significant difference to the cold start times. Memory settings higher than 1vCPU had little effect.

While native is over twice as fast as SnapStart the fragility of deployment for lambda and the massive increase in build times and agent CPU requirements due to compilation was un productive for our use cases.

Snap start adds around 3 minutes to deployments to take the version snapshot (on AWS resources) which we consider acceptable compared to the build agent increase that we needed to do for native (6vCPU and 8GB). As we are back to Java and scripting our agents are back down to 2vCPU and 2GB with build times less than 10 minutes.

How do you integrate Snap Start with AWS CDK?

This is a little tricky as there are not specific CDK Function props to enable SnapStart (as at 26th August 2023). With CDK we have to fall back to a cloud formation primitive to enable snap start and then take a version

Code example from out Open Source Spring Boot framework below.

final IFunction function = new Function(this,
                                       LAMBDA_FUNCTION_ID,
                                       FunctionProps.builder()
                                                    .functionName(LAMBDA_FUNCTION_ID)
                                                    .description("Lambda example with Java 17")
                                                    .role(role)
                                                    .timeout(Duration.seconds(timeoutSeconds))
                                                    .memorySize(memorySize)
                                                    .environment(Map.of())
                                                    .code(assetCode)
                                                    .runtime(JAVA_17)
                                                    .handler(LAMBDA_HANDLER)
                                                    .logRetention(RetentionDays.ONE_DAY)
                                                    .architecture(X86_64)
                                                    .build());
CfnFunction cfnFunction = (CfnFunction) function.getNode().getDefaultChild();
cfnFunction.setSnapStart(CfnFunction.SnapStartProperty.builder()
                                                      .applyOn("PublishedVersions")
                                                      .build());
IFunction snapstartVersion = new Version(this,
                                         LAMBDA_FUNCTION_ID + "-snap",
                                         VersionProps.builder()
                                                     .lambda(function)
                                                     .description("Snapstart Version")
                                                     .build());

In CDK because Version and Function both implement IFunction, you can pass a Version to route constructs as below.

String apiId = LAMBDA_FUNCTION_ID + "-api";
HttpApi api = new HttpApi(this, apiId, HttpApiProps.builder()
                                                   .apiName(apiId)
                                                   .description("Public API for %s".formatted(LAMBDA_FUNCTION_ID))
                                                   .build());
HttpLambdaIntegration integration = new HttpLambdaIntegration(LAMBDA_FUNCTION_ID + "-integration",
                                                              snapstartVersion,
                                                              HttpLambdaIntegrationProps.builder()
                                                                                        .payloadFormatVersion(
                                                                                                VERSION_2_0)
                                                                                        .build());
HttpRoute build = HttpRoute.Builder.create(this, LAMBDA_FUNCTION_ID + "-route")
                                   .routeKey(HttpRouteKey.with("/" + LAMBDA_FUNCTION_ID, HttpMethod.GET))
                                   .httpApi(api)
                                   .integration(integration)
                                   .build();

Note in the HttpLambdaIntegration that we pass a Version rather than the Function object. This produces the Cloudformation that links the API Gateway integration to your published Snap Start version of the Java Lambda.

References

June 4, 2023June 5, 2023

Integrate AWS Cognito and Spring Security

How to integrate AWS Cognito and Spring Security using JSON Web Tokens (JWT), Cognito groups and mapping to Spring Security Roles. Annotations are used to secure Java methods.

The various software components of the authorisation flow. — Authorisation flow for a web request.

AWS Cognito Configuration

Configure a user pool.
Apply a web client
Create a user with a group.

The user pool can be created from the AWS web console. The User Pool represents a collection of users with attributes, for more information see the amazon documentation.

An app client should be created that can generate JWT tokens on authentication. An example client configuration is below, and can be created from the pool settings in the Amazon web console. This client uses a simple username/password flow to generate id, access and refresh tokens on a successful auth.

Note this form of client authentication flow is not recommended for production use.

We can now add a group so that we can bind new users to a group membership. This is added from the group tab on the user pool console.

Creating a user

We can easily create a user using the aws command line.

aws cognito-idp admin-create-user --user-pool-id us-west-2_XXXXXXXX --username hello
aws cognito-idp admin-set-user-password --user-pool-id us-west-2_XXXXXXXX --username hello --password testtestTest1! --permanent
aws cognito-idp admin-add-user-to-group --user-pool-id us-west-2_XXXXXXXX --username hello --group-name Admin

Fetching a JWT token

The curl example below will generate a token for our hello test user. Note that you will need to adjust the URL to the region your user pool is in, and the client id as required. The client ID can be retrieved from the App Client Information page in the AWS Cognito web console.

aws cognito-idp initiate-auth --auth-flow USER_PASSWORD_AUTH --client-id NOT_A_REAL_ID --auth-parameters USERNAME=hello,PASSWORD=testtestTest1!

Example access token

eyJraWQiOiJLeUhCMkYzNmRyc0QrNXdNT0x4NTJlQVNUNG5ZSmJTczB4NjJWT1pJNE9FPSIsImFsZyI6IlJTMjU2In0.eyJzdWIiOiJlODAwNGYyNy1lMGVjLTQ0YTMtOGRlZC0yYmE1M2UzOWZkZDMiLCJjb2duaXRvOmdyb3VwcyI6WyJBZG1pbiJdLCJpc3MiOiJodHRwczpcL1wvY29nbml0by1pZHAudXMtd2VzdC0yLmFtYXpvbmF3cy5jb21cL3VzLXdlc3QtMl82bkpHeGZKdkQiLCJjbGllbnRfaWQiOiJzZzRraTkyNDByNnBsMTlhdjRwYjA4N3JlIiwiZXZlbnRfaWQiOiIyZDM2MGQ1NS0yYjNiLTRlZjYtODM1ZC0xODZhYjE4ODAzZTMiLCJ0b2tlbl91c2UiOiJhY2Nlc3MiLCJzY29wZSI6ImF3cy5jb2duaXRvLnNpZ25pbi51c2VyLmFkbWluIiwiYXV0aF90aW1lIjoxNjg1ODc1MjY0LCJleHAiOjE2ODU4Nzg4NjQsImlhdCI6MTY4NTg3NTI2NCwianRpIjoiMWZhNTkyNzgtMGVhOC00N2E5LTg4OGYtMGJjNTQ4OWQwYzk4IiwidXNlcm5hbWUiOiJ0ZXN0In0.BZIH55ud1zCduw3WiMBbSlfEuVC4XPT6ND5CmhpbAqOI4_NghX-Y8ghW9FdIDch1bO0vDREChSEEKfPoWIe7MScsM3Gb6uhMjiE3cJBdquolY5T6JnFMS4JduREnGvlNXUx9H19DLV3zxauwciag6gSajGedGb8418T6X_qSiPgTOQqKS7J_WdodBtZ6k1_XCiTekFIc9WIkiRQdL6mo3yowSQJB4YJ7bCOrWquDkfCnoPvllbqCov7RGr8RUbGVmtZR14dm82RU_tu-AAdMDFshmVvYpfS5ZQProH97y05LlxDjJQ9t0TZwRcrfaMCAxfehfhBUViVNpr5DBgfcuA

If you decode the access token, you will see we have the claim cognito:groups set to an array containing the group Admin. See https://jwt.io

Spring Configuration

Our example uses Spring Boot 2.7x and the following maven dependencies:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-oauth2-resource-server</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-security</artifactId>
</dependency>

We start by configuring a Spring Security OAuth 2.0 Resource server. This resource server represents our service and will be guarded by the AWS Cognito access token. This JWT contains the cognito claims as configured in the Cognito User Pool.

This configuration is simply to point the issuer URL (JWT iss claim) to the Cognito Issuer URL for your User Pool.

spring:
  security:
    oauth2:
      resourceserver:
        jwt:
          issuer-uri: https://cognito-idp.us-west-2.amazonaws.com/us-west-2_xxxxxxxxx

The following security configuration enables Spring Security method level authorisation using annotations, and configures the Resource Server to split the Cognito Groups claim into a set of roles that can be mapped by the Spring Security Framework.

This Spring Security configuration maps a default role, “USER” to all valid tokens, plus each of the group names in the JWT claim cognito:groups is mapped a a spring role of the same name. As per spring naming conventions, each role has the name prefixed with “ROLE_”. We also allow spring boot actuator in this example to function without any authentication, which gives us a health endpoint, etc. In production you will want to bar access to these URLs.

@Configuration
@EnableWebSecurity
@EnableGlobalMethodSecurity(prePostEnabled = true, securedEnabled = true, jsr250Enabled = true)
@Slf4j
public class SecurityConfig {

    public static final String ROLE_USER = "ROLE_USER";
    public static final String CLAIM_COGNITO_GROUPS = "cognito:groups";

    @Bean
    public SecurityFilterChain filterChain(HttpSecurity http) throws Exception {
        return http
                // actuator permit all
                .authorizeRequests((authz) -> authz.antMatchers("/actuator/**")
                                                   .permitAll())
                // configuration access is secured.
                .authorizeRequests((authz) -> authz.anyRequest().authenticated())
                // oauth authority conversion
                .oauth2ResourceServer(this::oAuthRoleConversion)
                .build();
    }

    private void oAuthRoleConversion(OAuth2ResourceServerConfigurer<HttpSecurity> oauth2) {
        oauth2.jwt(this::jwtToGrantedAuthExtractor);
    }

    private void jwtToGrantedAuthExtractor(OAuth2ResourceServerConfigurer<HttpSecurity>.JwtConfigurer jwtConfigurer) {
        jwtConfigurer.jwtAuthenticationConverter(grantedAuthoritiesExtractor());
    }

    private Converter<Jwt, ? extends AbstractAuthenticationToken> grantedAuthoritiesExtractor() {
        JwtAuthenticationConverter converter = new JwtAuthenticationConverter();
        converter.setJwtGrantedAuthoritiesConverter(this::userAuthoritiesMapper);
        return converter;
    }

    @SuppressWarnings("unchecked")
    private Collection<GrantedAuthority> userAuthoritiesMapper(Jwt jwt) {
        return mapCognitoAuthorities((List<String>) jwt.getClaims().getOrDefault(CLAIM_COGNITO_GROUPS, Collections.<String>emptyList()));
    }

    private List<GrantedAuthority> mapCognitoAuthorities(List<String> groups) {
        log.debug("Found cognito groups {}", groups);
        List<GrantedAuthority> mapped = new ArrayList<>();
        mapped.add(new SimpleGrantedAuthority(ROLE_USER));
        groups.stream().map(role -> new SimpleGrantedAuthority("ROLE_" + role)).forEach(mapped::add);
        log.debug("Roles: {}", mapped);
        return mapped;
    }
}

A now a code example of the annotations used to secure a method. The method below, annotated by PreAuthorize, requires a group of Admin to be linked to the user calling the method. Note that the role “Admin” amps to the spring security role “ROLE_Admin” which will be sourced from the Cognito group membership of “Admin” as previously configured in our Cognito setup above.

@PreAuthorize("hasRole('Admin')")
@PostMapping
public Mono<JobInfo<TickDataLoadRequest>> create(@RequestBody TickDataLoadRequest tickDataLoadRequest) {
   return client.getTickDataLoadClient().create(tickDataLoadRequest);
}

That’s it! You now have a working example for configuring cognito and Spring Security to work together. As this is based on the Authorisation header with a bearer token, it will work with minimal configuration of API Gateway, Lambda, etc.

April 2, 2023April 2, 2023

Reading Dukascopy bi5 Tick History with the TradingData Stream Library for Java

This java library reads the publicly available binary format bi5 Dukascopy Bank tick history files and convert them to a Java InputStream to be used with your applications.

https://bitbucket.org/limemojito/trading-data-stream

TradingDataStream FX Data model library

This library supports;

High level search APIs for Tick and Bar streams, backed by cached dukascopy files.
on demand fetch from Dukascopy
local filesystem caching
Amazon Web Service S3 caching
Bar aggregation from the tick data
Bar search queries by barCount or date time range (UTC).
stream -> CSV file conversion.
stream -> JSON file conversion.
“Standlone” configuration for quick scripts.
Spring bean configurations and customisation for use in large applications.

Provided under the Apache 2.0 License, please refer to LICENSE.txt and DATA_DISCLAIMER.txt in our software code repository. This software is supplied as-is, use at your own risk and information from using this software does NOT constitute financial advice.

Please note we are not affiliated with Dukascopy in any way. This project was a clean room engineering effort to read the dukascopy files. This library was inspired by the C++ binding at https://github.com/ninety47/dukascopy.

Fetching Tick data using Dukascopy bi5 publicly available history data

Using TradingDataStream with a maven project

Add the following to the dependencies section of your pom.xml

<dependency>
  <groupId>com.limemojito.oss.trading.trading-data-stream</groupId>
  <artifactId>model</artifactId>
  <version>2.1.2</version>
</dependency>

TradingDataStream: Using the high level TradingSearch API for Tick data

This high level API allows you to use a query by time to retrieve ticks. An appropriate number of bi5 file are retrieved from dukascopy to answer the query, with data timing, etc to fit the results within the query parameters.

The standalone setup here uses local file caching in your user’s home directory under .dukascopy-cache to cache the bi5 files retrieved to increase the speed of repeated searches.

TradingSearch search=TradingDataStreamConfiguration.standaloneSetup();
try(TradingInputStream<Tick> ticks = search("EURUSD","2020-01-02T00:00:00Z","2020-01-02T00:59:59Z")){
    ticks.stream()
         .foreach(t -> log.info("{} {} bid: {}}, t.getMillisecondsUtc(), t.getSymbol(), t.getBid());
}

TradingDataStream: Reading an existing Dukascopy bi5 FX Tick History file with Java

We recommend using the TradingSearch APIs as these work with configured caches to reduce the load on the Dukascopy servers. Our low level APIs can read individual file data streams as below.

The separation of “path” and the file data is due to the naming convention of the data in the Dukascopy repository.

Symbol/Year/Month (0 indexed)/DayOfMonth/{24hourOfDay}h_ticks.bi5

String path = "EURUSD/2018/06/05/05h_ticks.bi5"
try(FileInputStream fileStream = new FileInputStream(path);
    TradingInputStream<Tick> ticks = new DukascopyTickInputStream(VALIDATOR, path, fileStream)) {
    ticks.stream().foreach( t -> log.info("{} {} bid: {}}, t.getMillisecondsUtc(), t.getSymbol(), t.getBid());
}

Tick Dukascopy File Format

Note that dukascopy is a UTC+0 offset so no time adjustment is necessary

The files I downloaded are named something like ’00h_ticks.bi5′. These ‘bi5’ files are LZMA compressed binary data files. The binary data file are formatted into 20-byte rows.

32-bit integer: milliseconds since epoch
32-bit float: Ask price
32-bit float: Bid price
32-bit float: Ask volume
32-bit float: Bid volume

The ask and bid prices need to be multiplied by the point value for the symbol/currency pair. The epoch is extracted from the URL (and the folder structure I’ve used to store the files on disk). It represents the point in time that the file starts from e.g. 2013/01/14/00h_ticks.bi5 has the epoch of midnight on 14 January 2013. Example using C++ to work file format, including format and computation of “epoch time”:

LZ compression/decompression can be done with apache commons compress:

https://commons.apache.org/proper/commons-compress/

This format is “valid” after experimentation.

[   TIME  ] [   ASKP  ] [   BIDP  ] [   ASKV  ] [   BIDV  ]
[0000 0800] [0002 2f51] [0002 2f47] [4096 6666] [4013 3333]

TIME is a 32-bit big-endian integer representing the number of milliseconds that have passed since the beginning of this hour.
ASKP is a 32-bit big-endian integer representing the asking price of the pair, multiplied by 100,000.
BIDP is a 32-bit big-endian integer representing the bidding price of the pair, multiplied by 100,000.
ASKV is a 32-bit big-endian floating point number representing the asking volume, divided by 1,000,000.
BIDV is a 32-bit big-endian floating point number representing the bidding volume, divided by 1,000,000.

Tick Data JSON Format

Note that epoch milliseconds is relative to UTC timezone. source is live | historical

{
   "epochMilliseconds": 94875945798,
   "symbol": "EURUSD",
   "bid" :134567,
   "ask" : 134520,
   "source": "live",
   "streamId": "00000000-0000-0000-0000-000000000000" 
}