MLOps101: What is Operational Tempo? [Part 2]
Infrastructure, Modeling Complexity, and Regulations matter more than you think.
Read time: 5 minutes
Hello from Doha! 🇶🇦
I hope you are all doing well! Its been busy work trip, so I haven’t had time to release articles. Here’s part 2 of the 3 part series on MLOps operation tempo.
Check out Part 1 here, if you haven’t read it:
Previously I discussed the first three factors that affect MLOps operation tempo: good strategic goals, data quality, or organization. These are critical if you want to get the flow of your ML workflows correct.
In this part, I’ll discuss the final three critical pieces often overlooked: infrastructure and tools, modeling complexity, and regulatory requirements.
Infrastructure and Tools
Infrastructure and tools are two key factors that can significantly impact the speed and agility of machine learning development and deployment. Infrastructure makes sure predictive outputs are delivered in a timely manner. While tools help make sure you can automate repetitive processes and enhance insights gained from data.
Infrastructure
Infrastructure must have decent computing, good data storage, and reliable networking infrastructure. This enables faster development and retraining cycles. It also makes iteration and deployments quicker.
ML models as they scale, require larger amount of computing power. They also need data storage to save experiments, data, etc. Without sufficient or correct computing and storage resources, ops tempo is slowed down. Which limits the number models, that can be developed, deployed, or scaled.
Data is the core of machine learning model development, with infrastructure to support that data the most critical. Before starting MLOps (or even ML modeling for that matter) focus on building a robust data storage, data pipelines, and data versioning processes. The last is especially critical if you are building models.
Tools
Tools also play a role in ops tempo. They’re used not only automate repetitive tasks and complex processes, but to improve reproducibility, model management, data management, monitoring and security. Tools automate these processes but can slow them down - especially if they are incompatible or vendor lock-in occurs.
Some tooling issues that commonly slow down MLOps operation tempo:
3rd party tools that are incompatible with each other
Redundant tools that duplicate similar processes
Different tool versions between teams
Too many tools to solve problems that could’ve been done manually.
Good audits and assessments of these tools need to be done regularly. This helps eliminate any inefficiencies that tool conflicts, duplication, different formats, other factors create.
Each may be small, but without an occasional audit, can significantly slow down MLOps processes and model development.
Artifacts are an important part of the MLOps process.
Check out this article on what Artifacts are and how they help MLOps processes.
Modeling Complexity
Modeling complexity affects ops tempo in MLOps in three ways: training, technical, and execution. MLOps often runs into slowdowns at these three points.
Training intricate models can be challenging. Data scientists and engineers may need to spend more time experimenting and validating data. For data engineers, complex models require extra level of validation and data quality checks. For data scientists, complex models have higher difficulty interpreting, maintenance, debugging, and optimizing time. High complexity means more scoping, development, and testing time.
Technical complexity also increases in proportion to a model. The more complex, the more resources, people, and time need to build models, engineer a pipeline, demo, and perform user acceptance testing. There’s greater time needed to retrain and rebuild if that model fails in production. Even when it does succeed, the testing and validation is more extensive than a simple model.
Time is also important factor. Especially for the business units, who you have to keep informed. Team members may need to devote additional time and effort to plan for these models and demonstrate value to the business. Setting clear time limits for model experimentation and development is critical.
Balancing model complexity with operational efficiency is important for maintaining a manageable MLOps tempo.
Regulatory Requirements
Regulatory requirements also change the ops tempo. Complying with different internal and external rules can speed or slow down development.
It gets more complicated especially if you are working with international clients or stakeholders. Data available in some geographic regions cannot be removed from one region and used in another. Some models in some geographic regions require more documentation.
It also extends to the data used to build the models, as well as the storage. GDPR and other regulations may limit the use of features used to build models. Teams need to implement proper data management practices and potentially adjust their models to maintain privacy, which can affect the overall operations tempo.
Certain industries regulations may also require additional model validation or third-party audits. Model governance and documentation for these audits adds to the development time. It is critical that these needs are scoped prior to working with a business unit or client. These regulations may even ban the use of certain models.
With regulatory factors, data science teams often need to build custom solutions or compliant models, which adds extra development time and cost.
Conclusions
Infrastructure and tools, modeling complexity, and regulatory requirements are critical factors in operational tempo. They’re often issues that are not as obvious as good strategic goals, data quality, or organization, they’re important if you want consistency in your MLOps tempo.
This is part 2 of a 3 part series on MLOps operation tempo. In the next part, I’ll discuss practical ways that you can improve ops tempo speed – without sacrificing quality.