Data Notes #1: Asking the Right Questions = Good ML Models

Who you ask will make your model development better

and

Jun 30, 2023

Hi everyone! This is Data Notes. This will be a regular series that will help simply core concepts for data science senior contributors. I’ll be doing this along with a strategy notes series.

So lets get started!

This week in Data Notes:

Who is involved in the disovery process for ML models?
What is their role?
What questions do we need to ask them?

Knowing how to approach a problem, and knowing who to ask? Very important in how you structure it. Many times, the success and failure of ML projects come down to asking practical questions.

Scoping is far more important and crucial than relying only on fancy charts. Models that are built around only around tech, rather than the business problems cause bottlenecks and lost insights. It can pile on uncessary tech debt, rather than insights to the users.

These are crucial if you want to improve the quality, and the buy in for your ML model projects. It affects many thing downstream: the models you build, the archtecture you use, and even how you serve the ML model.

Let’s look at the people you need to ask. As a data science pro, knowing the why, what, when, and how in that order are important drivers to enhance tech. They give it framework, keep it at cost, and deliver insights at cost.

1. Executive Sponsor

Think of the Executive Sponsor as the lighthouse in your venture. They provide the guiding light, illuminating the 'why' behind the model. Just as a lighthouse provides direction to ships, the Executive Sponsor sets the vision and ensures that the project aligns with overall business objectives. Questions for them include:

Why are we building this?
What decisions will this model enable or speed up?
What value are you trying to envision and achieve?

2. Department Leader

This role serves as the architect of your project. They sketch the 'what' and 'for whom' details. Their insights about the department's needs and challenges are critical in shaping the model, the data used in it, and what it needs to predict.

They're instrumental in refining the vision provided by the Executive Sponsor into a functional design. Questions to ask include:

What outcomes are you looking to see from this ML model?
How will this model integrate into our existing workflows?
Do we have alternatives that are not Machine Learning based?

Departments heads usually know the larger strategic needs of the department. For high level use cases, they’re great resources to ask about how the ML model fits into the larger mechanics. Including the number of resources that can be freed up for your project.

3. Project Manager

The master builder, answering 'when' and 'what'. They plan, allocate resources, set timelines, and manage risks. Like a builder managing the construction process, they ensure that the model building process goes as per plan and within budget. Key questions for them might be:

What timelines do we have?
What current resources and budget are available?
What are the risks?

Operations SMEs. Knows the resources available. They can tell you about timelines and the feasibility of a project with resources at hand.

4. DS Team Lead

They're akin to a project engineer, working closely with the project manager to supervise the model development process. They are the technical experts providing insights on how to execute the project within the timeline and resources available.

Questions to raise include:

What are the technical requirements for this project?
How can we ensure the quality and accuracy of the model?
How do we label data? (Critical one)

The data science team lead really hones in on the ML and AI development. They’re responsible for the technical side of project execution: algo selection, experimental design, data requirements, and quality control.

6. Data Engineers

They can be compared to the miners, responsible for unearthing and refining the raw data. Their main focus is to ensure the data used by data scientists is relevant, clean, and well-prepared.

Questions to ask include:

Where is the data coming from, how is it derived?
Do we have any current business objects that can get us the data?
What sort of data architecture will enable faster, timely data, and better insights?
How do we version data?

They are your data subject matter experts. You cannot build a ML model without knowing the data landscape They tell you if the data exists, raw data, and data sources. Any modeling work is dependent on their knowledge of the data.

7. Data Scientists

These are the craftsmen of your project. They are the 'how' part of your ML model building. They mold raw data into meaningful insights, using algorithms and feature engineering.

Questions to ask them could include:

How do we solve it?
What is the simplest model that answers the question?
What sort of feature engineering is needed?

They can suggest for you better ways to model or predict the data. Or if you even need a ML solution to answer a use case. Not all problems require a ML model, or a complex one. A data scientist can assist with that.

7. ML Engineers

Think of them as the builders who take the blueprint and transform it into a tangible product. They scale the models, foresee deployment issues, and work on automation. They bring the models into production and ensure they work seamlessly.

Key questions for them include:

What is an effective way to scale these models?
What are deployment issues you can foresee?
What can we automate in this model?

Engineers tell you if a model by a Data Scientist is deployable in production. They makes sure the model is usable for apps or end users. As models mature and use cases grow, they scales models and infrastructure for them, making them efficient and usable.

Remember, these are not the only questions to ask, but they serve as great conversation starters to facilitate smooth project execution and build credibility.

A guest post by

Matt Blasa

Writer @Datalife360, Data Strategy @Aspire Analytics. Opinions are my own.