When using AWS SnapStart to optimise our Java Lambdas, we’ve noticed an interesting caveat:
If the lambda is not invoked for a long period of time (say a week) then the snapshot image is discarded. The next invocation will generate a new image.
While this is not an issue for a lambda endpoint with some volume, for low volume lambdas, such as site where New User Onboarding may be rare, this means that the user experience may be poor as there could be a two minute delay on the invocation! In our situation we had a five second timeout on the call so this breaks immediately.
How do we work around this?
- Keep the lambda version hot with pre-provisioning of 1 (see Pre-provisioning concurrency). This has an AWS cost based on your lambda memory settings.
- “Nudge” the lambda by invoking it once a day on a timer. This has an AWS cost but only one invocation.
- “Tie” the lambda image to another behaviour with higher volume by using lambda based routing. As the higher volume invokes the image more often, snapshot staleness doesn’t occur.
- Replace the Java lambda with Javascript / python etc that has a lower cold start time.
Keeping a lambda SnapStart image hot with pre-provisioning
Adjust your deployment to set pre-provisioned concurrency to at least 1. Be aware that you will be charged for the lambda execution as if the lambda was running for the provisioned time.
Consider an ARM (cheaper) 1GB lambda provisioned for one day in us-west-2 (Oregon)
$0.0000033334 for every GB-second x 60 x 60 x 24
= 0.0000033334 x 86400
= USD $0.288 per day
= USD $105.12 per year
Plus execution time costings for actual invocation.
Keeping a lambda SnapStart image hot with a timer
Adjust your deployment by creating a CloudWatch event to invoke your lambda once a day. This tutorial, while focusing on Javascript, is applicable for the CloudWatch setup to invoke the Java lambda.
Note the response can be ignored, we are simply invoking so that the image remains hot. An example AWS cost of for an ARM (cheaper) 1GB lambda provisioned in us-west-2 (Oregon) with a 250ms execution time:
$0.0000000133 for every GB ms x 250
= USD $0.0000033250 per day
= USD $0.00123 per year
This may also be within the “free tier” for lambda invocations depending on your site traffic.
For an example using Java CDK: See our OSS example here.
“Tying” Lambda images together
In our scenario, we have a User Group Calculation lambda that is called at the session start for all logged in users that has a similar library and construction to the New User Onboarding. Given the volume of the User Group Calculation the image never becomes stale.
We adjust our deployment configuration so that the entry points for the User Group Calculation and the New User Onboarding point to the same Lambda image. That lambda implementation switches between the two functions based on the request event structure.
At the cost of moving routing into the implementation, we have tied the high volume and low volume calls so that the share image never becomes stale.
Replacing Java SnapStart lambda implementation
Another option is to replace the low volume Java lambda with interpreted code that will not suffer from the SnapStart image staleness. A Javascript lambda would be lighter weight, and if the lambda code is not too complex it could be crafted without middle wares to speed lambda time.
However this introduces more of a polyglot language approach, which we wanted to avoid as we have a lot of in house libraries that speed our Java development.
Conclusion
For our start-up software we decided to use the day timer as the AWS cost was trivial and we could apply a standard approach into our CDK module for lambda deployment.
Beware low volume lambdas and SnapStart.