The Hidden Costs of Cloud Data Lakes

This blog series from Cazena's engineering team investigates the hidden costs of cloud data lakes. Learn the top three hidden costs of cloud data lakes!

Read the Blog Series

Understanding the Bardess Data Science Maturity Curve: Phase One “Learning”

Hannah Smalltree, Cazena & Daniel Parton, Lead Data Scientist, Bardess

The Data Science Maturity Curve

The Data Science Maturity Curve is a tool developed by our partners at Bardess Group. I recently interviewed Daniel Parton, Lead Data Scientist with Bardess, for more insight about each stage of the curve. 

The Bardess Data Science Maturity Curve

Click to view larger (new window) and download.

 

Data Science Maturity Curve – Phase One: Learning

Everyone has to start somewhere. In this awareness phase of data science maturity, companies are learning, hiring, and planning. Some may have business intelligence or enterprise reporting, but want to expand into data science, machine learning and AI. Some may be playing catchup, leveraging untapped datasets or focusing on digital transformation.

While much of this phase is about education and resourcing, tech definitely plays an early role. The tech may not always be data science or machine learning specific. First, companies will need a solid foundation.  To make real, tangible progress, data will need to be collected, stored and prepped for analytics. That is often easier said than done, as many data scientists can attest. Datasets may be big or unstructured, access may be challenging and additional tools or platforms may be needed.

In the first phase, organizations find themselves awash in seas of jargon and tech choices. This can be confusing. And while consultants like Parton and his team at Bardess can help – companies shouldn’t rush to outsource everything, he said. Part of the journey is getting comfortable with new terms and building a data culture.

“It’s important for the management not to just outsource data science completely. They need to ultimately get used to that data science jargon, and start getting used to thinking about their business and their business’ data in data science terms,” Parton said. “For data science to be successful at a company, there always needs to be some kind of cultural shift.”

Tech shifts are also inevitable, as organizations experiment with more datasets, different use cases and tools. When it comes to data science tech, he said, organizations need to consider both the skillsets of their team and their goals.

Do you need a data science platform?

It depends. Parton explained that a lot of exploration work of data science may be done by writing code in Python or R, or using customized tooling. But he clarified, tech plays a big role in operationalizing data science. A solid data tech stack is required for moving data science discoveries into production or scaling results.

Depending on the use case, tools can be helpful early on. Parton thinks about data science technologies in two categories:

  1. Automated machine learning tools aim to build a machine learning model for you automatically. These don’t require much, if any, data science expertise from the end user. These automated ML tools are essentially an alternative to writing custom R or Python code, and may come with domain expertise or models.
  2. Data Ops tools are more about streamlining data science workflows. These aren’t data science specific per se, but they are critical for success.

“These tools make it easier to collaborate, easier to get access to data, easier to set up APIs so you can operationalize models, all these sorts of things which are around data science,” Parton explained. “If you can help streamline that, you allow the data scientist to work on the stuff they’re good at.”

Do you need tools in both categories? Not necessarily, Parton said.  It depends what you’re trying to do. Most will want Data Ops capabilities early on, in order to productionize efficiently and move programs up the maturity curve to exciting, yet often risky Phase 2 – the Emerging Phase.

Watch this Data Science Maturity Curve interview excerpt to hear Daniel share more about how he thinks about tools for data science – and why a software stack is needed to get data science discoveries into production. Get more insight on how to think about data science platforms and technology.

 

Continue on to read more about Phase 2 of the Data Science Maturity Curve


Related Resources