(Replying to PARENT post)
(Replying to PARENT post)
(Replying to PARENT post)
I recall Joyent's solution to this (similar) problem where you have an object stored somewhere (e.g. S3) and you want to use that object in a container, but you have to copy it over HTTP or something to do any work on it and the object could be very large.
With Joyent's Manta[1] you would spin up a container right where an object is stored (instead of bringing the objects to the container via NFS.) Also has map reduce support.
(Replying to PARENT post)
(Replying to PARENT post)
One thing that tripped us up at the time was EFS not supporting encryption in transit: but this was fixed in early 2018 when EFS began supporting using stunnel to wrap the underlying NFS connection in TLS. https://docs.aws.amazon.com/efs/latest/ug/encryption-in-tran...
It reads as if this lambda integrated EFS works out of the box with encryption in transit
(Replying to PARENT post)
(Replying to PARENT post)
(Replying to PARENT post)
It wasn't obvious to me if this is somehow mounting EFS over !NFS (since never says the words NFS in the post). My main fear when people say "Should I use Lambda / Google Cloud Functions / Cloud Run against my NFS server" my response isn't "How would you set that up" it's "Be careful. Cleaning up NFS locks held by clients that have gone away is fairly painful, and you have none of the mechanisms to make sure it exits properly".
Alternatively, you can mount without locking, and then you get one of the comments downthread about "and now you've given functions shared mutable state but with bad primitives".
tl;dr: Cool! ... But, how does this handle NFS locking?
(Replying to PARENT post)
Is anybody using Lambda to run huge MapReduce jobs? Do people still use Hadoop?
Doesn't this basically just let you have something like HDFS for running large distributed computations with some shared state, without having to reach for S3 or redis?
(Replying to PARENT post)
If your use case can deal with one EFS volume per isolation boundary, you can use IAM to control who can mount what volume, which might be easier to reason about.
Cloud-y DLP tools don't know about EFS.
(Replying to PARENT post)
(Replying to PARENT post)
More seriously, this is huge. Unix pipes over shared nfs has always been my big data platform of choice (since before the cloud, or even google map reduce). Things finally came full circle.
(Replying to PARENT post)
So now lambda functions can mount persistent block storage.
Next up: allow your lamda functions to run for longer
Then: allow multiple lamda functions to execute concurrently, and indefinitely, as a group, while having block storage mounted
And finally: use your EC2 instances as lambda functions
(Replying to PARENT post)
I tried Lambda for a use-case that I had in 2018:
We published Polls and Predictions to people watching the 2018 World Cup. We set the vote callback URL to a function on AWS Lambda.
It failed spectacularly during our load-testing because the ramp-up period was far too slow. We needed to go from 0 to 100,000 incoming requests/second in about 20 seconds.
We had to switch to an Nginx/Lua/Redis endpoint because Lambda was just completely unusable. It would have cost us $27,000/month to pre-provision 10,000 concurrent executions...
What is it that people actually use Lambda for?