Ampere Computing buys leap forward in AI inference performance

Machine learning inference models have been running on X86 server processors since the very beginning of the latest, and by far the most successful, AI revolution, and technicians who know both the hardware and software in it. fine detail among hyperscalers, cloud builders, and semiconductor manufacturers have been able to tune software, plug in and readjust hardware for over a decade.

The Arm architecture, which is relatively new to the data center, has some pretty impressive chips from Ampere Computing, Amazon Web Services, Fujitsu, and a few others in the works, but the same amount of tuning for the inference was not made, and with very different instruction sets and chip architectures, it takes time. Or, very talented software engineers who already know how to do it.

Luckily for Ampere Computing, he partnered up with such a talented set of software engineers from a startup called OnSpecta, and he loved the company’s AI inference engine so much that he just bought the company. ‘business – for an amount of money that has not been disclosed, as is usually the case when companies acquire startups.

Ampere Computing is the best bet we can make today for an independent supplier of Arm-based server processors, although they only target hyperscale and cloud builder workloads. The company has very close partnerships with Oracle, Microsoft, Baidu and Tencent, which are in various stages of deployment with the company’s Altra and AltraMax processors, that we have profiled here, and a roadmap with an annual cadence additions of functionality and expansion of performance, which these high-level customers require. For the company to capitalize on the substantial investment that The Carlyle Group has made in Ampere Computing to acquire the assets and part of the team of the former Applied Micro, one of the true innovators of the Arm server chips that didn Didn’t escape speed because, frankly, he didn’t have enough money to do the job.

In late 2018, Oracle said in filings with the U.S. Securities and Exchange Commission that it paid Ampere Computing $ 40 million to secure a less than 20% stake in the chip designer bought a stake. 20% stake in Ampere Computing for $ 40 million, which probably means the company raised around $ 180 million from other sources, most of which came from the Carlyle Group. The private equity firm that controls Ampere Computing has not disclosed its funding, so no one is sure. Ampere Computing has over 500 employees and is on its way to becoming a unicorn; we’ll see how much money he can make by selling tokens.

That’s the goal of acquiring OnSpecta, says Jeff Wittich, chief product officer at Ampere Computing. The next platform. “We were already working with OnSpecta, and this acquisition will allow us to focus more on the intrinsic hardware part while giving us more flexibility with machine learning. Ultimately, we’re looking to sell processors and the software they’ve developed simply makes our class a leader in processor performance in the inference space. I want to sell processors, and this will help us do that.

When pushed, Wittich admitted that Ampere Computing was not necessarily interested in selling the OnSpecta software stack to competitors, but the company will honor its agreements and further stated that the software runs on processors. X86 as well as to accelerate the inference performance of hybrid processors. GPU machines too. On X86 machines, the acceleration of the OnSpecta Deep Learning Software (DLS) inference engine is about a factor of 2X, and on Ampere Computing Altra and AltraMax processors, the acceleration is about 4X, according to Wittich. , and in particular he compares a 28-processor Xeon SP “Cascade Lake” with the DLBoost mixed precision enhancements for the AVX512 vector engines and an 80-core Altra processor with much smaller math units, but much more. Wittich adds that with the OnSpecta DLS stack adjusting the inference, an Altra can outperform an Nvidia T4 GPU accelerator by about 4 times on inference work.

OnSpecta was founded in 2017 and has a dozen employees, according to Wittich. It had a first round of financing in 2019 and two convertible notes (a kind of debt that can be converted into shares), one in May 2019 and one in January 2020, from WestWave Capital. The amount of this funding was not disclosed. The company has two founders, and both are serial entrepreneurs. The first Victor Jakubiuk, the company’s chief technology officer, who received his bachelor’s and master’s degrees in computer science from MIT, then worked in his computer science and artificial intelligence lab, eventually creating a startup called DataNitro, who created an accelerator for Microsoft Excel in Python, via the YCombinator incubator. Jakubiuk was also co-founder and CTO of, a company that integrates enterprise-level customer relationship management systems with social media.

The other co-founder is OnSpecta CEO Indra Mohan, who received his Bachelor of Technology from the Indian Institute of Technology and an MBA from Harvard, then sold nearly a decade of digital trading systems at Teknekron Systems, which was acquired by Reuters. . After that Mohan had a series of businesses, which were sold, one of the biggest was Interweave, which was founded during the dot-com era, which did web queries and reports and which sold to Cognos (now part of IBM).

The DLS software stack has three layers. Working from top to bottom, the framework’s integration layer presents a consistent and optimized view down to machine learning frameworks; currently supported media are TensorFlow, PyTorch and ONNX, which cover many bases, but more will be added in the future according to Wittich. In the next layer of OnSpectra’s DLS stack, optimizations are done at the neural network layer, and depending on the workload – for example image recognition or a recommendation engine – it organizes the data in order to be able to communicate with the frameworks and iron in the right way to take full advantage of the compute elements, caches and memories of the hardware. The hardware acceleration layer is a mix of microkernels and a compiler that actually talks about the CPU, GPU, or custom ASIC instruction set that supports inference, and it optimizes the model code. inference entering the compute engine to increase performance. otherwise be.

Much of this kind of tuning is done by hand among hyperscalers and cloud builders and AI researchers, and the whole idea of ​​OnSpecta is that it is done transparently and automatically for those who don’t want to. not necessarily enter machine code to drive performance. It’s for everyone, and if the performance is good enough with DLS, then they won’t have to worry about doing it by hand on a new Arm architecture – which is an important point and why Ampere Computing just buys the company. He can’t afford to wait.

We also believe that everything that has just been said about inference can potentially be applied to machine learning training or even traditional simulation and modeling workloads, in CPU-only configurations or in hybrid configurations with a mix of CPUs and accelerators of one type or another. Wittich admitted this in theory, but was unable to comment on the company’s specific plans in these areas. What he admitted is that the OnSpecta stack makes the job of porting inference codes from X86 processors and Nvidia GPUs much easier, which is why it has been available on A1 instances on Oracle Cloud since. awhile now.

Leave a Comment