As enterprises seek to drive faster big data outcomes, cloud offers a promising solution for agility. Indeed, public cloud infrastructure is, in many cases, far cheaper and faster to deploy than on-premises alternatives. Yet cloud big data deployments have proven complex for many enterprises, and few companies are ready to call systems officially in production. Reasons range from compliance concerns to integration issues, but there’s a much bigger problem lurking. The real challenge holding back production big data cloud deployments has less to do with the infrastructure or PaaS capabilities: It is the pervasive lack of DevOps skills for big data.
Read our takeaways from a recent Gartner Insight report titled "What CIOs Should Do About Strategic Chief Digital Officers." The Chief Digital Officer title is relatively new, but growing quickly, so it’s worth reviewing the report to get familiar with the emerging definition of the role. We’re offering the research because we’re hearing about more projects with the ‘digital transformation’ label. Several of these projects seem to look a lot like advanced analytics projects, but with ‘transformational’ new branding. Is it just a new label? As the report explains, digital transformation is a big deal, and Chief Digital Officers have different goals than the CIO. The full report is available, complements of Cazena, until March 31, 2018.
At AWS re:Invent today, Cazena announced a concept called AppCloud that allows enterprises to attach innovative analytic or machine learning applications to their enterprise data in the cloud.
Microsoft’s Azure Data Lake Store (ADLS) is a highly scalable storage solution which boasts the ability to store trillions of files, including files a petabyte in size. It is also 3x cheaper than running HDFS on Azure’s Standard Storage solution. We have been eager to see what it can do. Cloudera recently published an analysis, and we did some complementary benchmarking of popular SQL on Hadoop tools. In this Cazena Engineering blog post, we present our findings and assess the price-performance of ADLS vs HDFS.
As enterprises seek to migrate and manage their production analytic workloads in the public cloud, we increasingly hear teams considering PaaS offerings (such as AWS EMR, Redshift, Azure HDI) as the first stop for implementation. But many don’t realize that only provides a foundational analytic capability in the cloud. Significant additional work is needed to migrate and manage analytics in production.
A telecommunications company called Cazena to learn about our Data Lake as a Service, hoping that it would help them deliver an at-risk project. It definitely would, it’s a cloud use case we know well, with a solution designed accordingly.
As part of our ongoing series on Productionizing Hadoop and Spark in the cloud, we explore performance optimization, and how companies scale and tune for the best performance. We also discuss what’s required for production-grade deployments, often an underestimated part of the process.