How MLOps Helps CIOs Reduce Operations Risk
The enterprise promise of Artificial Intelligence and Machine Learning (AI/ML) is to bring scale with analysis of terabytes of data and deliver insights that empower strategic planning for future growth and investments. All positively impact business top-line revenue and bottom-line costs. A key aspect to their realization is the operational risk management of AI/ML workflow that starts with data and ends with a model in production. Machine Learning Operations (MLOps) platforms are designed to streamline such workflows and reduce the risk of derailment of production deployment. They bring a strong value proposition for CIO organizations towards establishing a “process” in successful AI/ML operations.
We know that AI/ML and software development workflows have much in common. Therefore, many existing process workflows can be reused in CIO organizations. But we also know that there are some key differences including the importance of data in AI/ML models. These differences may become high risk and operationally challenging, as discussed below.
AI/ML modeling is stochastic where the outcomes are characterized by probabilities, unlike software development where the behavior is deterministic. The probabilistic outcomes are defined by factors such as data and the algorithm code. This is because while the algorithm defines the type of model and associated parameters, the data defines the parameter values that affect the model behavior and outcome . Thus, it is important for AI/ML model developers to run multiple experiments to understand which dials to turn to reach an outcome with a reasonable probability. These dials that influence the AI/ML models are characterized by versions of data, algorithm and the set of hyperparameters (that define an algorithm structure), as shown in Fig 1. The bigger the business impact, the more the data and parameters fine tuning, and higher the number of experiments.
This kind of a multi-experimentation AI/ML workflow operationally necessitates team collaboration with different team members running different experiments. Specifically, tracking which version of data is used with which version of the algorithm and the hyperparameters for each experiment. Culminating in choosing a wining model and successfully deploying the anointed model to production. These steps are where MLOps platforms make an operational impact that translates into successful AI/ML model deployments.
For example, MLOps platforms leverage best practices in team collaboration from software development and integrate with version control tools such as Jenkins and Git, to name a few. These CI/CD (Continuous Integration/Continuous Delivery) tools effectuate collaborative and effective software development with multiple ready-for-production models in short time durations.
On the other hand, MLOps platforms version all AI/ML development artifacts starting from data to algorithm to hyperparameters to operational model in production. This enables complete transparency with model lineage and data provenance so that it is clear which version is ready to be deployed to production and what are the delta changes since last deployment.
Enterprises follow strict governance protocols to promote projects into production that are operationally managed in MLOps platforms with role-based access control (RBAC) . Additionally, they also ensure production model provenance using concepts such as entity (which data, algorithm, and hyperparameter versions), activity (when deployed and the changes since last deployment), and agent (who all signed off on the deployment).
In conclusion, inherent operational risk in AI/ML model development are compounded by probabilistic outcomes and multiple moving parts such as data, algorithm and hyperparameters. CIO organizations can reduce such risks with a MLOps platform and increase the chances of successful AI/ML model production deployments to positively impact the business.