Data Lakes are too hard for most enterprises.
Data Lakes are a foundational capability to get fast access to data and enable AI/ML, data engineering, and all analytics, but they require specialized expertise that is hard to find and retain.
Enterprises are discovering that Cloud data lakes in production require Architects, Cloud Ops, Data Ops and SecOps (See the “DevOps Drag" blog.) Data Lakes also take a long time, typically 6+ months of upfront development followed by ongoing management, support and optimization as the stack keeps evolving.
The early adopters, digital natives and sophisticated enterprises, have been able to build and manage data lakes, yet the majority of Global 8000 enterprises lack these skills. Worse, the gap is widening, because of the dearth of available DevOps skills.
For the past five years, the Cazena team has been developing the first SaaS Data Lake as a Service. Cazena’s Data Lake as a Service aims to simplify data lakes, so that they can be used by all enterprises, big or small, without requiring specialized skills. Enterprises that have been using the SaaS Data Lake as a Service have radically simplified their operations and accelerated results. They have experienced production-ready data lakes and analytic outcomes within 2 - 4 weeks – a 10X acceleration over their DIY efforts. They have been able to expand and scale their deployments, without adding new DevOps expertise.
As an example, one enterprise has launched multiple digital products, driven by 100+ data scientists and data engineers globally distributed, using a variety of data engineering and ML tools, all powered by the SaaS Data Lake as a Service supported by one data architect. Cazena’s Data Lake as a Service is now deployed for many enterprises across multiple regions on AWS and Microsoft Azure. Cazena monitors and manages over one million analytic workloads for enterprises each month. (See an Engineering blog with metrics).
Data Lake Evolution: On-Prem, Cloud PaaS to SaaS
The SaaS model is a major leap forward for data lakes. This infographic shows the evolution of Data Lakes from on-prem clusters, to cloud PaaS, and now to SaaS Data Lake as a Service. The three generations of data lakes have had distinct characteristics.
- On-premises data lakes were notoriously hard to deploy and manage, as the technology stack was still maturing. Few enterprises could get sufficient value from them – often because access by data scientists and business users was too hard. These DIY data lakes needed large administrative teams for ongoing management, typically required 9-12 months for deployment and had significant operating expenses.
- With the maturing of Cloud PaaS (platform-as-a-service), cloud data lakes have emerged as an alternative over the past 3 - 4 years. While these offerings reduce the complexity of the compute infrastructure, cloud data lakes require significant DevOps expertise around cloud, security and modern data ops – skills that majority of enterprises lack.
- The SaaS Data Lake as a Service is a third generation offering that addresses the skills shortage with a SaaS model requiring zero DevOps effort for deployment and ongoing operations. The SaaS Data Lake is typically 10X faster to deploy in production, with typical outcomes in 2-4 weeks vs. 6+ months for cloud DIY. SaaS data lakes are also strategic use of resources, since they embed the best innovation around IaaS and PaaS stack, require no additional skills, reduce costs and help existing teams scale with automation. For a good definition of SaaS vs. PaaS vs. Managed Services, refer to research on Big Data Managed Services by Matt Aslett at 451 Research.
To experience Cazena’s SaaS Data Lake as a Service, please sign up here. Cazena provides a turnkey automated platform with an end-to-end orchestrated stack including data ingestion, cloud storage, compute, best of breed database PaaS, and hosting for analytical tools (Data Engineering, AI/ML and BI). Each Data Lake as a Service is provisioned as a private/single-tenant deployment, secured and continually monitored and managed to ensure the best experience and price-performance. A separate blog will describe its architectural details.