AWS recently announced that Amazon MSK Serverless is now generally available. The serverless option to manage an Apache Kafka cluster removes the need to monitor capacity and automatically balances partitions within a cluster.
Amazon MSK Serverless is a cluster type for Amazon MSK designed to automatically provision and scale compute and storage resources. Marcia Villalba, senior developer advocate at AWS, explains the main advantage of the serverless addition:
It is the perfect solution to get started with a new Apache Kafka workload where you don’t know how much capacity you will need or if your applications produce unpredictable or highly variable throughput and you don’t want to pay for idle capacity. Also, it is great if you want to avoid provisioning, scaling, and managing resource utilization of your clusters.
According to Amazon, an MSK Serverless cluster supports any Apache Kafka compatible tools to process data and integrates with Amazon Kinesis Data Analytics for Apache Flink for stateful stream processing and AWS Lambda for event processing.
Amazon MSK Serverless currently supports AWS IAM for client authentication and authorization and to ensure high availability, creates two replicas of a partition in different availability zones.
Introduced in 2018, Amazon MSK is a fully-managed service to build and run applications that use Apache Kafka to process streaming data. The serverless option for MSK was a feature requested by the community and it was unveiled in preview at the latest re:Invent, together with serverless versions of Redshift and EMR.
Amazon MSK is not the only serverless service for data stream processing and analysis on AWS: Kinesis is a managed data streaming service where the amount of data that can be ingested or consumed is driven by the number of shards assigned to a stream. As reported separately on InfoQ, Kinesis has recently added a new capacity mode as well, Data Streams On-Demand. There are other options as well to run a managed version of the open source Kafka on a public cloud: Confluent Cloud is a cloud-native distributed event streaming platform created by the original developers of Apache Kafka.
The pricing of the MSK serverless offer is based on throughput among other factors.Halil Duygulu, senior big data engineer, asks AWS:
Looks like scaling effort becomes estimating the cost of serverless effort. Don’t you think five variables are a bit too much?
Corey Quinn in his newsletter agrees:
Their pricing page states that “With MSK Serverless, you pay an hourly rate for your serverless clusters and an hourly rate for each partition that you create.” If that’s “Serverless,” then IBM “Cloud” is a real cloud.
Every Amazon MSK Serverless cluster provides up to 200 MBps of write-throughput and 400 MBps of read-throughput and allocates up to 5 MBps of write-throughput and 10 MBps of read-throughput per partition.