What to look for in an MLOps tool – Part I


What to look for in an MLOps tool – Part I

Krishna Kumar Ramanujam
Share this post

In an earlier blog, we explain how MLOps helps to reduce inherent risks associated with the development and deployment of AI/ML models in enterprises.

Implementing MLOps workflows, as with any other workflows, involves the age-old choice of “Build vs. Buy” – whether to implement a tool from scratch or buy (or subscribe to) an existing off-the-shelf tool. Each organization may choose to respond to this dichotomy in its own manner, keeping its unique requirements and capabilities in mind. Either way, it would help to list these requirements to ensure that these are kept in mind during the decision process.

An MLOps tool should enable all stakeholders of the AI/ML development and deployment process to perform their tasks with improved efficiency and productivity. From this point of view, the requirements of such a tool can be classified into two categories:

  • Technical – features that address specific technical requirements of data scientists, business analysts, project managers, and other users
  • Non-technical – features that straddle user roles and are typically required by any enterprise-wide software.

In this blog, we enumerate the non-technical feature requirements, focusing on technical features in the next article in this series.

Non-technical features required from an MLOps tool are the same as those offered by any enterprise-grade software, with some tweaks to cater to their unique positioning. These include:

1. Installation Flexibility

  • Does the tool tie the organization into a specific cloud? 
  • Or is it cloud-agnostic? 
  • Does it have an on-premise offering? 
  • Does it provide a PaaS (Platform as a Service) version? 

2. Development Best Practices

The primary users of the tool will be the Data Scientist team. 

  • Does it support architectural best practices – e.g., modularity, reusability? 
  • Does it integrate with code versioning systems? 
  • What languages does it support (a minimum set should include Python, R, Java, Scala, and GoLang)? 
  • Is it flexible – does it impose custom ML libraries on developers, or does it support all popular ML libraries?

3. Collaboration

Many tools look at data scientists as “lone rangers”, working on their own to develop, test and deploy models. However, the entire AI/ML development and deployment process involves a team of data scientists, business analysts, Production Engineers, and Project Managers to complete the entire workflow. 

  • Does the tool support collaboration among all the stakeholders through a mix of appropriate workflows and collaboration tools?

4. Security

  • Does it integrate with the enterprise security systems? 
  • Does it provide support for industry-standard security best practices?
    Is the data within the tool secure, both at rest and in transit? 
  • Does it provide VPN-based access as a standard? 

5. Access Control

  • Does the tool provide Role-Based Access Control (RBAC) features? 
  • Can enterprises define custom roles and assign functions to these roles? Ideally, administrators should be able to assign individual actions permitted in the tool to a specific role(s).

6. Governance

  • Does the tool provide governance mechanisms through workflow approvals? 
  • Can you set multiple levels of approval authorities for any action? 
  • Are the logs adequate for failure analysis? 
  • Does it provide audit trails to ensure traceability? 
  • Is there model provenance defined by entity (which data, algorithm, and hyperparameter versions), activity (when deployed and the changes since the last deployment), and agent (who all signed off on the deployment)?

7. Performance

  • Does the tool support containerization? 
  • Are the models deployed to a scalable, fault-tolerant environment? 
  • Can you increase/decrease the deployment cluster size as required?

8. Ease of Use

  • Does the tool have an intuitive GUI to enable even non-technical business users to use it effectively? 
  • Is there sufficient documentation available? 
  • Does it include starter projects to reduce the learning curve? 

9. Support

  • Does the tool provider provide adequate support? One possible way to reduce the learning curve is to provide a managed service offering, wherein an engineer from the provider will support your AI/ML operations for a specific period of time before the final handover

10. Provider Reputation

  • What’re the provider credentials like? 
  • Have they implemented any AI/ML projects themselves? (In other words, do they eat their own dog food?) 
  • Does the leadership team have a Data Science background?

These are some of the questions every enterprise must ask before embarking on this choice. There are several MLOps tools available today, including xpresso.ai, AWS Sagemaker, Microsoft Azure ML, Allegro, Iguazio, DataRObot, Dataiku, etc. Make sure you evaluate all of them using these criteria and select the one best suited for your needs. Best of Luck!

About the Author

Krishna Kumar Ramanujam
Country Head and Executive Vice President – Engineering, xpresso.ai

Leave a Reply