The Hidden Costs of Cloud Data Lakes

This blog series from Cazena's engineering team investigates the hidden costs of cloud data lakes. Learn the top three hidden costs of cloud data lakes!

Read the Blog Series

Video: Dan Stair’s ODSC East 2017 Presentation, “Building a near-real-time Data Pipeline in the Cloud”

In this video from the Open Data Science Conference in 2017, Cazena Senior Engineer Dan Stair shared a technical talk about real time data pipelines. For more recent insight, please visit the Cazena blog. 

 

Abstract from Dan’s 2017 talk: Learn more about how we built, tested and delivered a near-real-time data pipeline using Apache Spark in the cloud in two weeks — and still saw our families. We faced a looming deadline, and real-time analytics requirements. Using a cloud-based platform with Spark and Impala running on Microsoft Azure, and armed with a few hundred lines of Python code, we designed, tested and deployed an end-to-end data pipeline and analytics infrastructure in two weeks. The project had its challenges, both technical and operational; learn what we learned and our tips for success.


Related Resources