The Hidden Costs of Cloud Data Lakes

This blog series from Cazena's engineering team investigates the hidden costs of cloud data lakes. Learn the top three hidden costs of cloud data lakes!

Read the Blog Series

Cazena to Accelerate Enterprise Migration to AWS with the Instant AWS Data Lake

Prat Moghe, Cazena Founder & CEO

We are excited to preview our latest development, the Instant AWS Data Lake – an Easy Button for all AWS analytics. The Instant AWS Data Lake is ready for analytics in minutes without requiring operational skills or resources.  All enterprises that want to modernize with cloud data lakes can now safely and rapidly migrate to AWS and benefit from the rich and growing analytics stack that AWS offers. This addresses the challenge facing many enterprises that are new to AWS and struggle with months-long DIY efforts to deploy and manage a production AWS data lake.

Developed in partnership with AWS, Cazena’s Instant AWS Data Lake orchestrates and integrates AWS analytics services from ingestion to analytics into a unified production-ready SaaS experience. This experience includes a rich and broad set of AWS technologies like EMR, Athena, Glue, Redshift, MSK, S3, SageMaker, etc.

Here is a quick FAQ:

  • What is a Cloud Data Lake?
  • What are the DIY challenges in building and managing AWS data lakes?
  • How will the Instant AWS Data Lake accelerate time to analytics?
  • How will I experience, try out or learn more about the Instant AWS Data Lake?


What is a Cloud Data Lake?

Cloud data lakes provide a flexible unified analytical platform for enterprises that want to modernize their data environments and migrate analytical workloads to the cloud. Today’s Cloud Data Lakes are more than storage or cataloging – they represent the complete production analytical environment from ingestion to storage to processing & tools. Unlike a cloud data warehouse, the cloud data lake offers broader support for all workloads (SQL, Spark, ..) and formats of data (structured, semi-structured, documents, images, …) across wide variety of tools for BI, data engineering, and data science/AI/ML.


What are the DIY challenges in building and managing AWS Data Lakes?

AWS offers a rich ecosystem of data and analytics technologies to build and manage a production-level cloud data lake. On AWS, cloud data lakes can be natively supported with clear separation of compute and storage, with fit-for-purpose processing engines for all types of analytical workloads. For example, data can be efficiently stored on object store (S3). Analytics can be run on stored data with a variety of processing engines (EMR/Spark/Presto, Redshift, …) for data engineering, BI, or ML. Ingestion and cataloging are supported with services like Managed Streaming with Apache Kafka (MSK), Glue, etc. Tools like SageMaker and QuickSight are also available, or third-party tools can be deployed.

Typically, it takes enterprises months to deploy a production data lake on AWS, building on the services described above. Most of this effort is spent in bespoke DevOps around orchestration, identity management, security, compliance, and on-going monitoring and operations of the end-to-end data lake environment. Additionally, AWS data lakes need to be securely connected in a “hybrid” configuration with the on-premises analysts and data scientists. Depending on the size and complexity of the data lake and workloads, a dedicated DevOps team of at least 3 to 5 may be needed, with specific skills on AWS, data platforms, and security. These resources are hard to find and retain, and cost upwards of $1M+ annually.

It is estimated that enterprises incur about 4-5X costs in DevOps & CloudOps teams for every dollar of infrastructure spend on AWS. The deployment and management processes are long and create risk for production SLAs and price/performance. For specifics, see this Gartner paper by Sumit Pal with a practitioner’s guidance on building and operating data lakes (Gartner account required). Matt Aslett of The 451 Group describes cloud native IaaS and PaaS offerings as “blueprints,” requiring significant expertise to develop the production data lake.


How will the Instant AWS Data Lake accelerate time to analytics?

The Instant AWS Data Lake is an easy button for AWS analytics and provides a production-ready AWS data lake in minutes, without requiring operational skills or resources.

The Instant AWS Data Lake has four key capabilities:

1. Turnkey SaaS Platform: Ready in minutes

The Instant AWS Data Lake is an automated turnkey analytical environment, from ingestion to tools, including connectivity to on-premises data sources and users, security controls, and all the cloud resources. All AWS analytics services such as AWS EMR, Athena, SageMaker, Redshift, MSK, Glue, etc. are orchestrated, provisioned and configured with unified identity management so that enterprise users can on-board immediately. The data lake can be deployed either as a standalone account or attach to an existing enterprise AWS account.

2. Continuous Ops: Managing Cost & SLA

The Instant AWS Data Lake is continuously monitored and optimized for the lowest price-performance for workloads, cost, and availability. Existing data teams can now use an AWS data lake without requiring dedicated DevOps or Cloud Ops resources. Instant AWS Data Lakes are less than half the cost of “do it yourself” (DIY) AWS data lakes.

3. Built-in Security & Compliance: “Private” SaaS on AWS

Each enterprise gets their own Instant AWS Data Lakes as a “private,” fully secured cloud service that is encrypted, and continuously monitored for security and compliance. Built-in controls are default for SOC-2, GDPR, HIPAA, CCPA etc.

4. Self-Service Analytics: One-click access to all Tools

Instant AWS Data Lakes offer a simple console for AWS analytical tools like SageMaker,  QuickSight, and other 3rd party tools.  BI analysts, data engineers, or data scientists can get secure one-click access to the data lake with their favorite tools, in the cloud or from on-premises.



What’s the Instant AWS Data Lake experience?

See a Screencast Walkthrough of the Instant AWS Data Lake.


How will I try out the Instant AWS Data Lake?

Free trials of the Instant AWS Data Lake can be requested here.


Can I subscribe via AWS Marketplace?

The Instant AWS Data Lake will be available on the AWS Marketplace for trials and enterprise subscriptions.


How do I learn more about the Instant AWS Data Lake?

Discover more about the Instant AWS Data Lake at

Browse the datasheet, see case studies and other resources.

Related Resources