Webinar Replay: SaaS Data Lakes

Simplifying Cloud Data Lakes for Rapid Analytics

Watch the Webinar

Video: Dan Stair’s ODSC East 2017 Presentation, “Building a near-real-time Data Pipeline in the Cloud”

Abstract: Learn more about how we built, tested and delivered a near-real-time data pipeline using Apache Spark in the cloud in two weeks — and still saw our families. We faced a looming deadline, and real-time analytics requirements. Using a cloud-based platform with Spark and Impala running on Microsoft Azure, and armed with a few hundred lines of Python code, we designed, tested and deployed an end-to-end data pipeline and analytics infrastructure in two weeks. The project had its challenges, both technical and operational; learn what we learned and our tips for success.

More information


Related Resources