The Hidden Costs of Cloud Data Lakes

This blog series from Cazena's engineering team investigates the hidden costs of cloud data lakes. Learn the top three hidden costs of cloud data lakes!

Read the Blog Series

Can You Use a Laptop for Enterprise Data Science?

Hannah Smalltree, Cazena & Daniel Parton, Lead Data Scientist, Bardess

I recently interviewed Daniel Parton, Lead Data Scientist, for more insight about the Data Science Maturity Curve, developed by our partners at Bardess Group. We tackled a frequently-asked-question from data and analytics leaders.

Aren’t laptops enough tech for enterprise data science teams?

Data science platform and technology discussions can be tricky. Some assert that no formal technology is needed for the discovery work of data science. Others maintain that technology is critical to success. Both are right in a way, explained Daniel Parton, lead data scientist with Bardess Group and author of the Data Science Maturity Curve. It depends what you’re doing.

“There are definitely cases where you can just do something on a laptop if you’re just doing a simple one-off analysis. You download the data and do your analysis and then send somebody a CSV file.”

But as a data science practice becomes more mature, Parton said these types of use cases are “the exception, not the rule.”

First, there’s the issue of scale. To get a large sample size, or analyze a huge dataset, you can’t necessarily load that into Python. Instead, you’ll need a data processing framework like Spark – and infrastructure for it to run.

Another major consideration is what Parton and others call “operationalization.” Once you’ve built a model – how do you integrate it with business workflows? Will it work in batch or real-time? Some will need integration with other applications, maybe setting up an API for web apps or other analytics tools.

For growing teams, collaboration also presents major tech risks and opportunities. Parton explained a very real scenario for enterprises.

“Maybe somebody built a model two years ago and they saved it on their laptop, maybe on a shared drive or something. And, in the meantime, that person has left the company. Then somebody else trying to pick up their work has to figure out ‘what was the data that was going in?’

The problem with data science is that sometimes you have very large datasets. It’s difficult to do version control on large datasets like that. So, it can be difficult to figure out which data was going into the model. And it also can be difficult to figure out which libraries they’re using, for example.”

This is an area where software can help streamline the development process, encourage re-use and keep track of the growing pipeline of data to analytics.

Parton offered another example of where tech is critical. Many companies take data feeds from third-parties or other internal sources. It’s critical to constantly monitor those feeds for changes, accuracy or other data problems that can dramatically impact outcomes.

In summary, Parton acknowledges that laptops can be used for some data science activities. However, real growth and enterprise impact typically requires additional data science tools and data software to be successful.

Hear more on this topic in video excerpts of the interview with Daniel Parton:


Read an introduction to the Bardess Data Science Maturity Curve or jump into insights on Data Science Maturity in Phase 1 – Learning or Phase 2 – Emerging.


Related Resources