
What to look for in a MLOps tool – Part II
In an earlier blog, we discuss the various non-technical features to be considered when buying/building/subscribing to an MLOps tool. In this article, we review the main technical considerations to be kept in mind when selecting such a tool.
It is useful to list these considerations in the context of the development process of an AI/ML project. This typically consists of steps, such as those shown below:
1. Data Management
AI/ML projects typically start with a business problem statement and some sample data, which data scientists use to start figuring out modeling strategies. At this point, the following questions usually arise:
- Does the tool have connectors to different data sources (local/remote file systems, RDBMS, NOSQL databases, cloud data sources, etc.)?
- Does the tool support data versioning? (extremely critical since training data is the most critical input to the model training process)
- Does the tool support automated data exploration and visualization?
- If yes, can developers perform their own custom exploration and visualization, or is this restricted to analysis supported by the tool?
- Can exploration and visualization output be exported and distributed?
- Does the tool support easy mechanisms for automated Feature Engineering?
- Does the tool support the reuse and share of features through Feature Stores?
2. Auto ML
AutoML enables users to create sophisticated models with a click of a button.
- Does the tool support AutoML?
- Does the tool support “white box AutoML”, wherein the code for the automatically generated model is available to developers to customize and enhance?
3. Experiment Management
Data Scientists spend a lot of time running experiments with different models. It is critical to keep track of these experiments, as well as model lineage, for experiment repeatability, as well as to reuse experiments across projects.
- Does the tool track data scientists’ experiments?
- Does the tool support model versioning?
- Does the tool support model lineage (ability to trace a model version back to its inputs)?
- Can the tool compare models across various parameters?
- Does the tool support the most popular libraries for Machine Learning and Deep Learning?
- Does the tool keep track of various metrics associated with the training process?
- Can the developer report custom metrics during the experiment run?
- Does the tool support best practices of software architecture, e.g., Object-Oriented Design, modularity, and reuse?
- Can developers compare models to determine which one is better?
4. Model Explainability
Once a model has been trained and selected, it is critical that it is explainable, i.e., the behavior is well understood and the output free from bias. A better-performing model may often be replaced by a slightly less accurate model, which is easier to explain. While Explainable AI (XAI) is a burgeoning field of research, some basic features need to be supported by MLOps tools:
- Does the tool provide means to understand model behavior?
- Does the tool enable bias detection in a model?
- Does the tool enable “what-if” analysis?
5. Model Deployment – containerization, Inference pipelines, and A/B Testing
Once a model is ready, it has to be deployed to a production environment, where it has to respond to requests, often with extremely low latency. For example, a credit card fraud detection model has to respond to thousands of requests a second.
- Does the tool support industry-standard deployment techniques, such as containerization?
- Specifically for AI/ML models, does the tool deploy these models using industry-standard techniques, e.g., REST APIs?
- Can the deployment environment perform well with heavy request load? Does the environment scale horizontally if required?
6. Model Monitoring
After a model has been deployed to production, it has to be monitored continually to check for decay in performance and/or accuracy.
- Does the tool provide means to monitor deployed models?
- What kind of metrics are provided for monitoring models?
- Can the tool monitor models create elsewhere?
- Does the tool enable the creation of alerts if metrics cross pre-specified values?
- Does the tool enable automated retraining of models in case of model decay?