Pure Storage and Snowflake are arranging for Snowflake compute to access and analyze data stored in FlashBlade file+object boxes based in fast access, cloud-adjacent co-los like Equinix datacenters.
Snowflake’s cloud data warehouse has its analytics routines processing data held in the cloud. But Snowflake has realized that large data sets may need to stay in customer’s possession for reasons such as data sovereignty, data sensitivity and the sheer volume – eg, the data set is so large it will take too long to move it to the cloud.
If a customer’s data is stored in a co-location datacenter with a fast connection to the public clouds – cloud-adjacent – then data access is almost as fast as if the data was actually in the public cloud datacenter.
Pure Storage CTO Rob Lee said: “This solution is built on Snowflake’s external table functionality. … [and] extending that external table functionality to where Snowflake can … work with external tables that are resident on Pure Storage arrays.
“For a customer that has data on Pure arrays in a co-lo facility in a near-cloud datacenter – they’ll be able to maintain control of that array.”
This enables customers to avoid “the need to fully migrate that data into the cloud, and it gives them flexibility to retain control and also to drive sharing with other data workflows.”
Snowflake’s SVP of Product, Christian Kleinerman, said: “We do not believe in the central monolithic copy of data having to be in a single place. … What this partnership reflects is the embracing of data needing to be in different locations, but you want to have a single point of query, for reasoning, governance, etc.”
He added: “Many customers have chosen Pure as their storage solution. They need to have it in their datacenters for a number of reasons. I mention the usual ones around security, governance, and liquidity. But we don’t think that that should become a trade-off on being able to bust silos and break down through barriers to data. That’s at the heart of this policy.”
Lee said: “I think there’s a natural overlap in customers and in company philosophy between Pure and Snowflake, I think we’re both disruptors and innovators in this space. And I think customers that are looking to get more out of their infrastructure are looking beyond stagnant decades of tech, and to get more and more of an as-a-service experience, are naturally drawn to Snowflake, and we like to think are naturally drawn to Pure as well.”
In his view, FlashBlade is an: “object platform that is really built for these high performance analytics workloads.”
Kleinerman said performance will depend “on the interconnect between where the storage array is and the region that you’re bringing from Snowflake. It’s helped by caching on top of Snowflake’s compute clusters and materialized views.”
The two support Equinix Fabric co-los with Azure ExpressRoute connectivity that bypasses the public internet and is available in 33 Equinix datacenters globally. The also work with Equinix co-los with AWS Direct Connect. Both ExpressRoute and Direct Connect provide lower latency and higher speed interconnects than the public internet.
Lee told Blocks&Files: “We see early interest in this solution from customers really across the board, large enterprises, telcos, public sector customers.”
This Pure-Snowflake collaboration follows hard on the heels of a joint Dell-Snowflake initiative to have Snowflake compute operate in a customer’s datacenter, with Dell servers and storage, or to copy or move data from there into the Snowflake cloud.
Neither Snowflake compute nor Pure-stored data need move under the Pure-Snowflake model. We think other cloud data warehouses and data lake analysers will follow in Snowflake, Pure and Dell’s footsteps here as they realise that hybrid clouds are the way ahead and they need to work with on-premises data. Ditto other external storage suppliers.
Pure and Snowflake’s technology will go into public preview by the second half of 2022.