The Hidden Costs of Cloud Data Lakes

This blog series from Cazena's engineering team investigates the hidden costs of cloud data lakes. Learn the top three hidden costs of cloud data lakes!

Read the Blog Series

DevOps: The Killer Drag for Big Data

By Prat Moghe, Cazena Founder

As enterprises seek to drive faster big data outcomes, cloud offers a promising solution for agility. Indeed, public cloud infrastructure is, in many cases, far cheaper and faster to deploy than on-premises alternatives. Yet, cloud big data deployments have proven complex for many enterprises, and few companies are ready to call systems officially in production. Reasons range from compliance concerns to integration issues, but there’s a much bigger problem lurking.

The real challenge holding back production big data cloud deployments has less to do with the infrastructure or platform as a service (PaaS) capabilities: It is the pervasive lack of DevOps skills for big data.

We estimate that 70-80% of the cost of any production big data environment is driven by DevOps resources. These skills are hard to find and retain. They are extremely expensive. Companies often underestimate the amount of work they will need to do to keep systems current and secure, and the time that they will need to do it. Thus, relying on DevOps can put a huge and unnecessary drag on data projects, slowing down potential returns, hindering analytic agility and throwing new variables into ROI equations. Addressing the DevOps drag in big data should be a critical imperative for all enterprises that want faster business outcomes.

Let me provide a few simple examples for illustration.

Cost: Cloud is cheap. DevOps is expensive.

DevOpsDragSummary8.jpg

A common goal of most production big data environments is to deliver self-service capabilities to data scientists and analysts, so that they can reduce preparation work and focus on analysis.

For a new team, let us assume a typical starter PaaS platform with Hadoop/Spark on AWS supporting production workloads: 10 normalized compute nodes plus object store, data processing PaaS, and nominal data ingest/egress. The licensing for this works out to about $80K a year. For larger PaaS footprints supporting multiple groups, say a medium-sized 50 node environment, total cloud costs work out to about $320K/year at current prices.

However, to deliver this big data platform capability requires expertise in development, optimization, ongoing operations and support. The problem isn’t just financial; it’s also difficult to hire these teams, depending on your location and requirements.

In fact, recent Big Data DevOps Team Salary Research uncovered some interesting facts: The salaries for a five person big data team cost up to $629,000 – $1,186,000 annually, depending on seniority, industry and location. 

While you may be able to cross-train, cloud and big data technologies require specific skill sets and coordination across activities. Different areas of expertise need to come together, at a minimum:

1. Cloud DevOps to administer cloud accounts and resources, and manage the cloud infrastructure.

2. Hadoop Platform Administrator to provision and tune Hadoop/Spark nodes, with attached data stores and centralized object store required to deliver workload performance.

3. Cloud Security Architect to administer security controls such as encryption, key-management, identities and role-based access control, as well as establish and ensure compliance controls.

4. Data Management Lead to manage and administer data ingestion, data governance and logging as well as manage user access from a variety of data engineering, machine learning and SQL tools.

5. Data Production Ops to cover first and second-line alerting, support, root-cause analysis and upgrade/patching/validation issues. This is also a catch-all capability required for technical issues like sprint tracking, billing, SLA monitoring and management.

While you might not need all of these as individual, full-time-employees to start, every area is important. For our basic starter environment example, we will likely need at least two superhero DevOps pros to cover all these skills, which will cost about $200K each or ~$400K annually. For a medium, scaled out environment (50+ nodes), individuals will be needed to fill all five roles, at a minimum, particularly since there will be multiple applications and end-user groups accessing these systems. Let’s assume these are heroes (not multi-role superheroes), in the mid range of salaries, that cost ~$150K annually each. With five people, this team will cost about $750K annually.

Now let’s do the math. This means that 70-80% of your overall costs are driven by DevOps. Cloud PaaS/IaaS itself costs less than 20% of the overall investment.

ComparisonZ.jpg

 

The Impact on Time to Production and Operational Complexity

Cloud is fast to cycle and easy to iterate through failure. DevOps is slow to cycle and has high risk to fail due to manual processes and significant configuration and optimization requirements. Spinning up cloud clusters is straightforward, and takes seconds to minutes. However, managing them in production is hard, especially with an ever-changing ecosystem of capabilities and many moving parts. Developing a hybrid data access environment for cloud infrastructure is also complex. Security, compliance processes and governance of users accessing the cloud platform is non-trivial. Taking all this into account, developing a production environment could take 6 months, assuming it’s well-managed. A wrong hire or unforeseen departure is even worse and could extend projects to a year or more.

Purely as a thought experiment, if we were to define the overall production “drag” as: [Cost of Investment] multiplied by [Time to get to Production], a stark picture emerges. Production drag is almost completely defined by DevOps drag – 99.97% of the physical friction is DevOps. This drag completely dominates cloud agility. See the graphic below; the area of the graph represents the drag.

DragAreaUpdate.jpg

Takeaways for Cloud and Big Data Projects

  • Factor in DevOps to model TCO. It is reasonable to evaluate various cloud providers and PaaS technologies, for example, to understand which ones spin up faster or provide better performance or full-featured enterprise capabilities. However, evaluating available DevOps capabilities is actually far more important since the DevOps drag will ultimately determine your agility and production success.
  • Factor in DevOps drag to assess risk and time to market. Drag ultimately translates into risk. Analysts have estimated that as many as 70% of big data projects fail in production with 9-12+ months to production. As the need to drive faster outcomes grows, this glaring disparity of cycle time required for manual DevOps will only magnify the challenge.
  • Explore alternatives to cut DevOps drag. For teams that don’t have superstar DevOps skills, look for approaches that cut through the DevOps drag without having to hire and retain new DevOps talent. Please explore more about Cazena’s Continuous Ops, which explains how automation and built-in DevOps eliminates the DevOps drag.

 


Related Resources