The Hidden Costs of Cloud Data Lakes

This blog series from Cazena's engineering team investigates the hidden costs of cloud data lakes. Learn the top three hidden costs of cloud data lakes!

Read the Blog Series

Transcript: Simplifying Cloud Data Lakes – Introduction to SaaS Data Lakes with Prat Moghe

SaaS Data Lakes: Simplifying Cloud Data Lakes for Rapid Analytics

In this webinar, learn about recent advances in cloud data lakes that simplify and accelerate the enterprise journey. Explore the transcript below or watch the complete event here.



Presentation Transcript: SaaS Data Lake Introduction


PratMoghe_Cazena_Headshot_B&w.jpgPrat Moghe

“Thanks again, Matt, for that introduction. It is great to hear about how you see this industry evolving around Data Lakes. To follow up on what Matt was saying, one of the challenges that we see at Cazena around Data Lakes is the complexity and the brittleness that Matt talked about.


These Data Lakes require expertise from enterprises, and that expertise is hard to find.

You will hear people call it “DevOps” as a loose term, but if you look further and deeper into that, what we see are five key roles that are hard for enterprises to find.

These are essential roles when it comes to these cloud Data Lake in production. These also include cloud security architecture, figuring out how to secure these Data Lakes in the cloud so that you can trust them, running all cloud DevOps, including cost management, obviously understanding the underlying data platforms, whether it’s Spark, SQL or some other database under the hood.

All ongoing security operations are also included, monitoring for key questions such as: Who is accessing your Data Lake? Is everything secure?

They are alerting if there are issues, and handling all production operations. A typical production Data Lake requires six to 10 of these roles. It’s hard to find them and it’s hard to retain these people, as well.

Cazena is the first SaaS Data Lake as a Service in this industry, and after five years of development, we are really excited to launch it and share it with all of you.

If you think about the evolution and look at this infographic, it shows that evolution of Data Lakes, much like what Matt mentioned.


The first-gen Data Lakes started about 10 years ago, right about when the definition of Data Lakes came up, and they were all on-prem.

If you think about technologies like Cloudera, Hortonworks, MapR, they were all the foundation of these on-prem Data Lakes. And, while they had early adopters, in general, these are really hard to operationalize and give access to business.

Typically, it took nine to 12 months of development, lots of sunk dollars in terms of expertise.

The second-gen Data Lakes came up about four years ago. If you think about the leader in the cloud market, I would say AWS EMR is a great example. Many companies that are digital natives or early adopters in the cloud are leveraging the cloud stack at the IaaS level and at the Platform as a Service level to develop and build their Data Lakes.

However, cloud requires even more specialized expertise, and we talked about the five specific skill areas that are hard to find. So it takes, again, four to six months to develop these Data Lakes in the cloud, and put them into production. Then it requires you to find those five specialized roles to continually do the ‘care and feeding’ of these systems.

Cazena’s idea was to see how we could make Data Lakes far simpler, so that the broader market can start to consume these Data Lakes. They’re intended to help IT and data leaders feed data fast to the line of business, so that you can have AI and ML users on top.

So that’s what the Cazena SaaS offering does. The idea here is it’s fast enough so that it’s production-ready in two hours. Typically in four weeks, you can get the first outcome, which is about 10 times faster than a typical PaaS do-it-yourself kind of Data Lake. But then also you don’t need to hire new skills and expertise. You can leverage your existing teams, and that’s the key value.

The three key principles behind these SaaS Data Lakes are:  It’s a single platform for all analytics, whether it’s data engineering, AI/ML, or BI.


These are end-to-end environments, not just compute, so everything from data ingestion to cloud subscriptions, storage, the data PaaS engines (all best-of-breed engines) are embedded. All data and analytics tools can be brought in as well as all the security infrastructure.

Then finally, these Data Lakes are production-ready and they come with all the ongoing care and feeding for DevOps, SecOps, Cloud Ops, so that all you need are data engineers and data analysts or data scientists.

This slide is bit of an eye chart, and we are happy to share more information on the architecture of a SaaS Data Lake. Typically, it uses end-to-end best of breed technologies for ingestion, storage, IaaS, PaaS, and what we call the AppCloud. The AppCloud is where data scientists and data engineers can bring their tools.

Then everything you see on the left side are four key areas of ‘day zero’ DevOps. That includes provisioning this Data Lake, securely connecting it to your on-prem data sources and users – these are all automated in a SaaS Data Lake.


Then what you see on the right side are five key areas of day one plus DevOps. That includes all the care and feeding, SecOps, workload ops, support, upgrades, and security monitoring — all built as part of the SaaS environment. When you are a user on this SaaS Data Lake, all the complexity is hidden from you. All you see is a SaaS console where you can plug in your tools. Then, if you are an IT or a data administrator, all the backend DevOps, SecOps, CloudOps are monitored 24×7. The key is this is managed as a SaaS environment, with no development required.

What it means is in terms of an outcomes is that unlike a do-it-yourself Data Lake on top of a PaaS, the SaaS Data Lake experience is four weeks versus six to nine months to the first outcome.

It means not having to hire these new skills and is typically half the cost from a TCO perspective and significantly lower risk.

What is exciting is I’m going to introduce you to one of the early enterprise users of this SaaS Data Lake so that he can share his experiences.

We have a pilot experience available around the Cazena SaaS Data Lake where you can, in your own environment, experience this SaaS Data Lake. Bring in your datasets, connect to your tools, and within four weeks, essentially, we expect to see the first outcome. Here is where you can learn more and sign up to experience this.

With that in mind, I wanted to introduce Gordon Coale, who’s a thought leader in the data industry with a lot of experience around data and digital transformation. He is a key lead and enterprise data architect at CWT, which is a leader in the travel industry. Gordon, it gives me great pleasure to introduce you to this audience, and I’ll turn it over to you.”

Next Presentation: CWT’s SaaS Data Lake Case Study > 

Related Resources