NEW RELEASE

MLRun 1.7 is here! Unlock the power of enhanced LLM monitoring, flexible Docker image deployment, and more.

The Future of Serverless Computing

Yaron Haviv | November 22, 2017

Serverless computing allows developers to focus on building and running auto-scaling applications without worrying about managing servers, as server provisioning and maintenance are taken care of behind the scenes. Industry demand for instant results has therefore made serverless platforms the new buzz.

However, serverless computing has challenges which limits its usability and applicability:

  • Slow performance and lack of concurrency (single threaded)
  • Lock-in to platform specific event and data sources
  • Complexity of application state maintenance, code dependencies and service dependencies
  • They are impossible to develop, debug, test and deploy in a hybrid or multicloud environment

Latency of tens or hundreds of milliseconds is the norm today when developing serverless functions in the cloud, and we’re lucky if we manage to run more than a few thousands of events/sec without taking on a second mortgage. This limits the usage to non-performance-sensitive front-end apps or glue logic. Serverless computing can address many more workloads if it were faster or more efficient.

We need to feed functions from messaging or streaming sources as we break past the comfort zone of simple web apps. We must also store state in cloud-specific databases/storage and use cloud-specific logging and monitoring tools. These integrations are not trivial and means that our function code is now tied to a specific cloud platform. What if we wanted to swap vendors? Or use a multicloud deployment? Or debug some code on our laptop? Or regression test on our mini cluster? Just forget about it.

Yeah, But What about These Numerous Open Source Serverless/FaaS Projects…?

Many open source serverless, or functions-as-a-service (FaaS), projects started as a way to rapidly develop front-end apps or glue logic. They weren’t too focused on performance, support for a variety of event sources or better integration with state, data, logging, or versioning to make them applicable for more demanding applications or deliver an end-to-end service experience.

They’ve now added more features, but fundamental architectural decisions cannot be patched over time. Some popular FaaS solutions address the challenge of triggering any containerized apps but work like CGI (every event launches the application and all its data connections from scratch), not like something that can work on apps with a high ingestion rate.

So What Can We Do?

  1. Build a serverless computing architecture with a focus on parallelism and CPU/Mem/IO from the ground up.
  2. Define a common event format across multiple event types, frameworks and cloud providers.
  3. Decouple event/data sources from the function code.
  4. Provide stronger data security and access control.
  5. Increase the emphasis on serverless debugging, logging and instrumentation tools/features.
  6. Address continuous development and operations across hybrid and multi-cloud environments. Make functions portable and versioned along with their event feed and data dependencies.

The above criteria guided iguazio when we built nuclio — a new advanced and high-performance open source serverless framework. We know it won’t be complete without a broader eco-system that takes on these challenges. We are actively working with the Cloud Native Computing Foundation to come up with industry-wide solutions and have participated in authoring this serverless paper to foster collaboration.

Here is how nuclio tackles those challenges:

Serverless Computing Performance

Application performance is killed by stuff like lack of parallelism, blocked access to resources (network, io, etc.) and excessive memory allocation/deallocation/encoding.

Serverless computing suffers the most since it is single threaded. Access to resources is blocked and, in many cases, connections restart per invocation. Serverless computing is also bloated with data copies, message serialization/deserialization, and context switches. No wonder performance is so slow! This impacts the type of use-cases we can serve as well as increases application costs (due to the need for more CPU and Memory resources).

nuclio introduces the notion of lightweight worker context, as multiple function tasks run in parallel on the same resources (using Go routines). Data buffers are reused eliminating allocation/deallocation and garbage cleaning. Access to external resources/data is always non-blocking. We eliminated the need for (JSON) serialization and data copies. The communication between nuclio wrapper processes (in Go) and language-specific runtimes is non-blocking and uses shared memory.

We made sure incoming data or streams are partitioned and dynamically balanced across multiple function instances to gain linear scalability in cases of large-scale message handling (with nuclio’s dealer service which can sustain failures while supporting different delivery modes/guarantees, such as ‘at-least-once,’ ‘at-most-once,’ etc.). The results are that nuclio can process 400,000 events per second using a single process. The bottom line is that nuclio is 100 times faster than mainstream serverless solutions.

Functions Triggered by Any Type of Source

When we work with AWS Lambda, currently the most popular serverless computing service, we are limited to a set of platform event sources (such as HTTP gateway, Kinesis and SQS). Each event source generates a different event structure even when it carries the same payload (forcing hacks like this). This means our function code is locked-in to a specific event implementation and provider.

Nuclio introduces the notion of a common event structure, separating protocols which are specific to the event header from content in the body, and supporting various serialization options (not forcing expensive JSON deserialization). With nuclio, the same function code can be triggered by HTTP, Kinesis, Kafka, RabbitMQ, MQTT, NATS, iguazio’s V3iO, file content, or an event emulator.

We can now develop our functions with an emulator, test with HTTP, use a stream for production and even change the source if you move between cloud providers or beta/production setups.

Nuclio’s common event source approach was proposed as a CNCF specification to address the problem of serverless computing lock-in and complexity. Hopefully, multiple platforms will adopt this concept.

Accelerating, Securing and Abstracting Function Access to Various Data Sources

Functions are stateless. They access external data services frequently to retrieve state or content, store results and pass messages. Today, specific data access implementation and credentials are usually either hard coded into the function, or at best, they use some environment variables. This means that we cannot build a function that will be portable and it is very hard to use the same function with different data types or repositories during development, testing and production.

Nuclio provides pluggable data bindings to functions. Developers work against data binding and operators can decide at deployment time which data systems and datasets are used. This concept, originally pioneered by Azure, results in:

  • Simplicity: coding against simple APIs without taking care of SDKs and connection details.
  • Security: credentials are not part of the function code/variables.
  • Portability: move functions around and work with different data sources.
  • Reusability: the same function can be used in multiple use-cases by changing the bound datasets.
  • Performance: data and connections are cached across invocations and work with non-blocking IO and zero-copy access.

Simplifying Function Debugging, Logging and Instrumentation

Debugging serverless can be frustrating, as it has no tracing or breakpoints. How can we automate regression testing, especially when we’re tied to external event and data sources?

Logging and prints all go to unstructured and resource-intensive logs with no way to control the detail level vs overhead tradeoffs. Automation requires complex log parsing.

There is still a lot to do in that space and it’s a challenge that needs to be addressed through collaboration and standardization between serverless platform providers and software providers of IDEs, testing, instrumentation tools and the like. nuclio takes a few important steps in the right direction:

  • Built-in non-intrusive structured and unstructured logging to a range of output options (screen, HTTP, file, log streams) with dynamically controlled verbosity level. This allows us to place many log/debug points in our code and change verbosity levels on the fly or per invocation to diagnose bugs and failures without changing the code. This structure option makes it ideal for test automation and analysis.
  • nuclio functions import the nuclio-SDK, making our local IDE work as if it was a local program, with auto-completions and breakpoints. The SDK provides a way to run our function locally, print logs to the screen/file, feed it from emulated events and emulate function input/outputs with pluggable data bindings.
  • For regression testing, we can create an event feed emulation (from files/http/streams/..) with an emulated input data set and compare the logs and data output with expected results, all without changing code.
  • Additional debugging and instrumentation features are in our roadmap, including custom statistics and OpenTracing integration.

Delivering Hybrid and Multicloud CI/CD Pipelines and Distributed Deployment Models

For some, doing all the work on a single cloud is a viable option, but for most others, that is expensive and limiting. A typical case involves developers working on laptops, regression testing on dedicated clusters that get bombarded with events 24×7 (not ideal for the serverless per invocation cost model), beta and production deployments. In the Internet of Things or edge scenarios, these same functions may also need to deploy in multiple edge locations. This means we need a versioned and distributed development pipeline, with an option for rolling or canary upgrades (deploy an 80/20 version mix).

Nuclio features mentioned above — such as the ability to run anywhere, the abstraction of events, logs, and data and the logging capabilities — are important features which enable deployment in multiple locations. In addition, nuclio provides a federated and versioned deployment approach. Compiled functions generate an artifact (binary or container image) that can be stored locally or on a shared image repo, function can have multiple versions each with unique metadata, events and data. With nuclio, we can choose to build/test on one cluster and deploy the same artifact on multiple clusters without rebuilding our code (deploy from image).

Summary

Serverless computing is a young and fast-growing space. It is a very promising way to accelerate businesses and it still has lots of room for innovation. I believe this can only be achieved through better collaboration and by the various vendors and members in the community sharing between themselves. We hope others will join the CNCF’s development and standardization efforts in the space.

If you want to play with a stand-alone nuclio version, it’s as simple as copy/pasting this line into your command line interface:

docker run -p 8070:8070 -v /var/run/docker.sock:/var/run/docker.sock -v /tmp:/tmp nuclio/playground

Open a browser at HTTP://<machine-ip>:8070. This will bring up the nuclio playground. Select an example function, edit, deploy and invoke. While you wait, please give it a star. Visit nuclio's GitHub page for the full version and more details.