In this video from the Open Data Science Conference in 2017, Cazena Senior Engineer Dan Stair shared a technical talk about real time data pipelines. For more recent insight, please visit the Cazena blog.
Abstract from Dan’s 2017 talk: Learn more about how we built, tested and delivered a near-real-time data pipeline using Apache Spark in the cloud in two weeks — and still saw our families. We faced a looming deadline, and real-time analytics requirements. Using a cloud-based platform with Spark and Impala running on Microsoft Azure, and armed with a few hundred lines of Python code, we designed, tested and deployed an end-to-end data pipeline and analytics infrastructure in two weeks. The project had its challenges, both technical and operational; learn what we learned and our tips for success.