The challenge – and opportunity – of being a niche AI ​​cloud


In this age of hyperscaler and cloud builder titans, seven of whom account for roughly half of the IT infrastructure purchased in the world, it is important to understand the importance of niches and the critical role played by other system makers. , other system vendors , and other system tenants are all part of the IT ecosystem. Never has a niche player been so important, and never has it been so difficult to be one.

And yet this is the path Lambda—which no longer calls itself Lambda Labs because it is now a fast-growing company, not some experiment in building AI systems and running an AI cloud—has chosen for itself. As niche players themselves here at The next platform, we respect that choice of course, which is not an easy road. If you can’t be the greatest or not the first, the only choice is to try to be the best. And really, there is no trying, as Master Yoda rightly points out. There is only do or don’t do.

If you have to pick a niche, this is a pretty good one. Machine learning training models are growing exponentially and so are the datasets they depend on, and the performance of the systems delivering the training is lagging. That means customers will have to buy more and more machines, even as those devices get more powerful thanks to Moore’s Law advances in parallel computing, memory bandwidth, and I/O bandwidth. Every time the software infrastructure grows faster than the hardware infrastructure – the proliferation of batch processing on mainframes in the 1970s, the relational database revolution of the 1980s, the commercial Internet boom of the late 1990s and early 1990s 2000, the data analytics explosion of the 2000s, and the AI ​​revolution of the late 2010s and early 2020s – now is a very good time to play in a niche with specialized hardware and software and technical expertise to handle it all. let hum.

we did a in-depth profile of Lambda in December 2020, when we spoke to Michael Balaban, co-founder and chief technology officer at the company, and in May this year we looked at some price/performance stats Lambda released are Lambda GPU Cloud instances based on Nvidia A6000 GPU accelerators versus Nvidia A100 GPU instances running in the Amazon Web Services cloud. Lambda’s point was that the Chevy truck’s GPU is good enough for many AI training workloads and superior to the Cadillac model in some cases. Right now Lambda doesn’t care about inferences, and there’s no reason why it should. The company wants AI. to build education infrastructure. Point. Inference is supposed to run on internal infrastructure, and that could be anything from CPUs to GPUs to FPGAs to custom ASICs, and Stephen Balaban, co-founder (with his older brother) and chief executive officer of Lambda, isn’t interested in sell inference systems. Not yet, anyway, but that could – and we think it probably will – change. But it’s important for startups to stay sharply focused. You should not trust those who don’t, because money and time are both scarce.

Lambda wants to ride that AI training wave with not only specialized hardware, but also by creating its own AI cloud built on its own hardware and its own software stack, called the Lambda Stack, of course, which is tuned by its own software engineers. . Lambda recently raised $15 million in a Series A financing round plus a $9.5 million debt facility, giving it the resources to sustain its own explosive growth. The Series A was controlled by 1517, Gradient Ventures, Bloomberg Beta, Razer and Georges Harik — most of whom were angel investors when Lambda started nine years ago — and the debt facility came from Silicon Valley Bank.

We took the opportunity to talk to Stephen Balaban, the CEO, about how the company is doing and what it sees happening in what is still a very young AI training market.

“Unlike other clouds and other system providers, we focus on just this one specific use case, which is deep learning training,” Balaban said. The next platform. “Our product base scales from laptops to workstations and servers to clusters to the cloud, and we are vertically integrated across all those devices with our own Lambda Stack. But there’s another aspect, and customers should ask themselves if they really need the gold-plated data center service experience to have Amazon Web Services be their operations team managing the infrastructure, because that’s very expensive, as you can imagine. .”

In fact, Lambda builds the kind of cloud you’d probably want to build yourself, if you had the skills. It is designed not to use the most common GPU compute engines, as the public clouds must do given the diversity of their workloads, but rather those GPUs with enough parallel computing power, sufficient memory capacity, and sufficient memory bandwidth at the lowest price to lower the total cost of ownership. Running workloads in a sustainable way means lowering TCO in a world where models and data grow faster than capacity grows. The public clouds have to massively overprovision their general purpose machines and then sell you the idea that you have to run your spikey workloads there and then charge you a high premium for the privilege of doing so. It’s better and cheaper, Balaban says, to run your AI training on your own hardware (made by Lambda, of course) and then process bursts on the Lamba Cloud, which is cheaper than AWS or Microsoft Azure.

So far, this niche game has worked well, and it’s one Lambda had to come up with because as an AI software pioneer, it couldn’t afford to run its AI applications on AWS without going out of business because of the immediate and explosive popularity of the AI ​​tools it brought to the web. The Balaban brothers learned the hard way that sometimes success is harder than failure, and that’s how a niche hardware company and niche cloud was born.

What Lambda is doing clearly resonates with organizations trying to master AI training and take it into production. In 2017, after Lambda became an AI application maker and homegrown cloud builder to support them, Lambda had its first full year selling AI training systems and raking in about $3 million in revenue from that hardware. Two years later, it grew to $30 million and by 2021, another two years later, it’s on track to do $60 million.

“We’ve found that there’s a huge demand — and growing demand — for deep learning training systems that just work,” Balaban said, and that Series A funding is going to build out the hardware and software engineering teams and the sales teams. to really see how big that addressable market for Chevy systems is compared to the Cadillac systems that the big cloud builders have to develop – and charge for – as they need to support a diversity of customers and workloads on their devices where Lambda simply doesn’t.

Software will be a key differentiator, and the Lambda Stack, which is packaged to run on Debian Linux and includes Nvidia drivers and libraries such as CUDA and cuDNN, as well as the TensorFlow, Keras, PyTorch, Caffe, Caffe 2, and Theano machine learning training frameworks. . With the fundraising, Lambda will extend the software that runs on its cloud and make it all a lot more user-friendly than these frameworks (many of them developed by hyperscalers and cloud builders who like byzantine and bizarre structures as a matter of pride) are when they are in the wild are released on GitHub. Ultimately, this polished AI training stack will be available for Lambda customers to deploy on their laptops and workstations, on their internal clusters, and on the Lambda GPU Cloud.

And that’s the secret of the niche. The experience will be the same for Lambda’s customers no matter where they create their AI models. They won’t even know the difference. The market will tell Lambda how valuable such an experience is, but we can infer it from Apple’s actual experience with its music business, can’t we?

Leave a Comment