MLOPs Basics #2: Why are Clear Handoffs Important?
The MLOps discussion often focuses a lot on tools. But the specific operational best practices aren’t mentioned as much.
Handoffs are one of the least mentioned topics. Its a source for major bottlenecks in the process, where a lot of ML model development runs into issues. Clearing this bottleneck is important for efficient MLOps.
Many data scientists get frustrated by a incomplete handoff from an Data Engineer. Many ML engineers get frustrated by an incomplete handoff from a ML engineer. Worse, many stakeholders get a model without a formal explanation.
If operations mean anything, its clear lines of communication and repeatable procedures. Establishing clear handoff guidelines and expectations is important. Not only for efficiency, but also for building trust and credibility between teams. It helps make their work easier and explainable.
There are three main handoffs to consider in Machine Learning Operations:
Data Engineer to Data Scientist
Data Scientist to ML Engineer
Data Science Team to Stakeholder
I’ll break it down a bit below.
Data Engineer to Data Scientist
This is the first hand off in the process. The difficulty here lies in the difference between how the data engineer and how the data scientists see the data. The data engineer sees data as an end, while the data scientist sees it as the beginning.
Both are correct. Its bridging the gap and helping both understand each other’s mindset.
What they need:
Data Scientist
Data documentation is critical for the data scientist. Knowing the context of the data helps them train, optimize, and develop high quality ML models. The hand off from the engineers needs documentation.
Documentation from the data engineer needs to have:
Data Version
Data Dictionaries
Lineage
Records of Transformations
Data Quality Checks
This will help the data scientist understand how the data was prepared and allow them to more easily work with it.
Data versioning and data dictionaries are important at this stage. The data scientist will need to know if the data has changed between the last hand off. They also need to know the logic behind the columns in the tables they are using.
A single model version can be trained on multiple model versions. While multiple models can be trained on different data versions. This context is crucial to model versioning and iterating on previous work.
For data scientists, the priority is knowing the context of their data. So that they can train, optimize, and create quality models. They need enough to begin.
Data Engineer
Understanding how the final state of the data is important for the data engineer. The data scientist must understand the data requirements for the model.
The following need to be communicated by the data scientist:
Scope of the data used to train a machine learning model
Model deployment and who will use it.
Data constraints and limitations
Data quality expectations
Features needed
Correct data type/formats
Transformations prior to training a machine learning model
Data engineers are very detailed oriented people. Little things in their side of data matter, and unclear definitions have large consquences.
Its important for the data scientist to recognize this, and provide them as much info for their requests. As well as explain the importance of their request, and how it fits into the larger picture.
Even more important is patience. Data engineers may have many questions, and its important for Data Scientists to answer them. It will help the hand off go smoother.
Artifacts are an important part of the MLOps process. Read this article on what Artifacts are and how they help MLOps processes.
Data Scientist to ML Engineer
This is the second hand off. For the data scientist, its being able to give a model that works good enough. For the ML engineer, its refactoring the model so it can be deployed.
The bottleneck for both the data scientist and ML engineer is the models. Data scientists may not understand the technical challeges of deploying and managing a production model. ML engineers may not always have a complete context of the models that was handed off to them or lack documentations.
What is Needed:
During handoffs, Data Scientists need to give ML engineers :
Model Artifacts. The data scientist needs to give the ML engineer with all model artifacts. This is trained models, data versions, experiment versions, feature engineering pipelines, and evaluation metrics.
Model Documentation. Documentation on the machine learning model, including use cases, end users, model type, training process, and experiments used to create the model. It helps the ML engineer understand how the model was developed and how
Deployment Requirements: The data scientist should provide the ML engineer with any specific requirements or constraints for deploying the model, such as required infrastructure, runtime environments, or security considerations.
Data Governance. Depending on the industry, not all models can be deployed the same. National laws may affect this as well. Also, model governance may affect how its monitored, deployed, etc. Its important for the Data Scientist to communicate these requirements.
During handoffs, ML Engineers need to communicate to Data Scientists :
Code Feasbility. Not all code that the Data Scientist creates is usable for production. It may contain extra code that is not needed in a deployed version. Or code that can increase the latency of the deployed model, making it costly. Or it may be simply be ineffcient.
Deployment. ML engineers should keep data scientists informed about the how the model will be deployed, including any issues or challenges that arise.
Performance Metrics. ML engineers should clarify what performance metrics are needed to assess the health of the model. Its also important for them get them to define what level of data drift requires retraining, and what constitutes concept dirft.
Infrastructure and Environment. Brief info about the infrastructure and environment in which the model is deployed. And how the model performance and serving will be affected.
Trained ML models that worked for the data scientist can fail with production data. Or contain extra code that is not needed for deployment. Which can increase latency of the served model. It is important during a handoff to make sure it is ready for a handoff.
Its also important to schedule time to do tests with both the ML Engineer to test the model in a staging environment. Comparison of the model in stage needs to be compared against a model deployed in production.
Data Science Team to Stakeholders
Handoffs from data teams to stakeholders are a crucial part of the ML Operations cycle. The handoff confirms the use cases have been answered, and the model is properly deployed and working for end users.
Of all the hand offs mentioned, this is the most crucial. All other hand offs prior to this point need to be done reasonably well. With all parties on the same page, especially if the hand off meeting requires the data engineer, data scientist, and ML engineer.
By carefully planning and executing handoffs to stakeholders, data teams can ensure that their insights are clearly understood and used to inform business strategy and decision-making.
What is Needed:
Documentation. It's important to provide clear and thorough documentation of the ML model. This needs a description of the use case being addressed, the major stakeholders, profile of the data, features used to train the model, algorithmn, and any assumptions or limitations of the model and its data.
Evaluation. Provide information on how the model was evaluated. This includes a list of performance metrics and how they were calculated.
Maintenance. Overview of how model will be maintained and updated over time. This should include retraining frequency, and criteria for retrain when new data is ingested to the system.
Deployment. An outline of the steps required to deploy the model in production and infrastructure or dependencies that will be needed. This may not be asked, but its important to have it availible.
User support. Consider how users of the model will be supported, including any documentation or tutorials that should be provided and any ongoing support that will be required.
Governance Issues. For certain industries and firms, this is a big one. Be sure to address any questions of legal or ethical risks. This can range from data privacy to inherent model bias.
The hand off from data teams to stakeholders is final crucial step in hands offs during the MLOps process.
For a transfer between data teams and stakeholders, this means being able to explain the effect of the model and its business impact - as well as reassure the stakeholders how it will continue its momentum.
This is a great opportunity not just to answer the business question, but build credbility and buy in for future iterations of a model.
Conclusion:
A hand off does not mean the end of the collaboration. It means a transfer of majority of responsibility. The person transferring that responsibility needs to be available to explain elements that aren’t understood.
Education about the why of the hand offs is important. Its easy to tell people to do something. But they are more likely to do it if you give the why. This should include how it affects who they are handing off to and operations at a larger level.
The goal of operations is to improve collaboration, communication and scalability - regardless of it is MLOps, DataOps, or DevOPs.
Good handoffs enable teams to build, test, deploy, and maintain ML models for business users. And iterate and scale effectively to answer their questions.
Thoughts? Let me know in the comments.