ModelOps

1. What are the different machine learning frameworks supported in xpresso?

We support all popular machine learning frameworks. We have special support through pre-built components in the xpresso Component Library for some of these: XGBoost, Sklearn, Keras, and LightGBM.

2. Does xpresso support distributed learning?

xpresso provides support for training pipelines created in PySpark. When deployed, these pipelines will run on Spark clusters, which can be scaled up or down as needed.

3. Can I train my neural network models in xpresso?

Yes, xpresso support training of neural networks. GPUs can be used for deployment of components and pipelines and can therefore be used for experimentation, if needed. xpresso.ai provides special support for popular DL libraries like Keras. When such libraries are used within xpresso, several relevant metrics are reported directly to the xpresso Dashboard, with minimal coding required.

4. How can I run experiments to try training different models?

To start training your models, you will have to first do the following:

  • Create components and a pipeline using the Solution Builder (you may wish to use some of the ML components from our Component Library)

  • Write code for each component and push it to the code repository

  • Build your individual components

  • Deploy your pipeline using Kubeflow

Now you will see your deployed pipeline under the “Experiment” page. You can click on “Start run” button and provide the parameters. Once done, click on “Create run”. The experiment will start and save the trained model to the xpresso Model Repository. You can then run more experiments using different hyperparameters, pipeline versions, or training data — xpresso will store the trained model each time. You can view a sample solution on how to create machine learning solutions here.

5. Can I schedule my experiments?

Yes, you can schedule an experiment run by setting the schedule time and frequency. To schedule, click the “Schedule Run” button on the “Experiments List” page. You can also find detailed steps here.

6. Can I have event-based triggers to start my experiments?

Event-based triggers are currently not an out-of-the-box feature in xpresso, but you can create a custom component and add it as the first component of your training pipeline. You can then schedule your experiment on a recurring basis . The custom component can listen to the event and, based on whether the event has occurred or not, can either execute the rest of training pipeline or just terminate the pipeline. For example, the first step in the pipeline could check for availability of training data at a specific location and run the rest of the pipeline only if the training data is found.

7. What happens when I pause my experiment?

When you pause an experiment, xpresso will run the experiment till the current component, save the state of your experiment, and then when you restart the experiment, it will continue from the last state you had saved it on. You as developer have the flexibility to define what state you want to save the experiment at by defining the conditions under the “Pause” function in your code.

8. What happens when I terminate my experiments?

When you terminate your experiment, it will bypass the current and all remaining components and terminate your entire pipeline experimentation. You as the developer have the flexibility to define the state and the conditions for termination of an experiment under the “Terminate” function in your code.

9. Where are all my trained models saved?

To view a list of trained models in a solution, click the “Model Ops” link on the left-hand pane, and then click the “Trained Models” link.

image1

10. How does xpresso version my model?

Output models are automatically versioned by xpresso.ai in a Model Repository when generated as part of training pipelines. A Model Repository consists of branches corresponding to experiment runs, with each branch containing commits. Each commit will have a folder, where all your models for that run will be saved as a file. You can also download the file directly from the Model Repository.

11. How is model lineage supported in xpresso?

xpresso saves the history of all models run with the following versions saved as part of each experiment run:

  • Version of data used

  • Version of pipeline

  • Version of the parameters used

  • Version of the output model

12. Are there feature engineering-specific capabilities within xpresso?

Feature engineering is part of our upcoming release. You can create custom feature engineering code and include it as part of the Component Library to be reused within other projects.

13. Is there a support for feature store?

Feature store is part of our upcoming release.

14. How do I track my experiments to its data, parameters, and the pipeline?

xpresso saves the history of all models run with the following versions saved as part of each experiment run:

  • Version of data used

  • Version of pipeline

  • Version of the parameters used

  • Version of the output model

15. How can I recreate an experiment I had run earlier?

To recreate an experiment, you need to select an already run experiment and then click on the “Action” button and select “Clone”. All parameters are pre-filled, except for the run name and run description. You can also modify the parameters if needed.

16. How can I see the logs of my experiments?

You can view the logs of every experiment by going into the experiment details page and clicking on “Experiment Logs” on the top right of the page. You can then view the logs of each of your components individually.

17. How can I persist the logs using xpresso.ai utilities?

xpresso.ai provides Logging classes for Python as well as Java developers. These enable logging in a standardized format, with output to one or more of: - a log file - the console - an instance of Logstash, where the log messages are indexed using Elasticsearch and can be queried and viewed using Kibana

You can view the details here.

18. How does xpresso help in A/B testing?

A/B Tests enable data scientists to test multiple deployed models simultaneously to check which one works best. xpresso allows you to create a service mesh of multiple models that can run using a single API. To run an A/B Test, visit the “Deployed Models” page. Now, select two or more successfully deployed models, and then click the “Run A/B Test” button on the top right-hand corner. You can specify a routing strategy by entering the weights for each model in the test.

19. How do I train my model on GPUs?

The process for training a model on GPUs is the same as for CPUs. However, you need to manually configure the pipeline to be deployed on the GPU cluster — the xpresso.ai support team can help with this.

20. Can I run multiple trainings simultaneously?

Yes, you can run multiple experiments at the same time. Based on the compute allocated, xpresso will automatically place experiments in queue if needed and run them once the previous experiments have been completed.

21. How can I monitor my models in production?

Model monitoring is an upcoming feature, where you can view the operation and stability metrics related to each model. You can also generate and configure alerts for the same.

22. What model monitoring metrics does xpresso.ai support?

For operation metrics, you can view metrics such as response time, number of requests, and CPU/Hard disk/Memory usage. In terms of stability metrics, xpresso will calculate population stability index (PSI), characteristic stability index (CSI), and divergence index (DI), among others.

23. Is it possible to automate model retraining and deployment in xpresso.ai?

Model retraining is not an out-of-the-box feature provided by xpresso. You can create a component that will check the model and use xpresso API calls to retrain and redeploy the model.

24. Do I get an alert if my model has drifted in production?

You can set upper and lower thresholds for your models (in %) and setup alerts when there is a drift on any of the metrics.

25. Where do I find my Inference Service URL after deployment?

You can copy the Inference Service URL by click on the “copy” icon next to your model in the “Deployed Models” page.

image2

26. What is the UI link next to my Inference Service?

The user can create an Inference Service by selecting a specific model. xpresso.ai creates an end point, which loads the model and provides a REST API (“/predict”) which can be invoked to get predictions from the model. The user may use POSTMAN (or any other equivalent tool) to issue requests, which involves formatting JSON request objects, etc. xpresso.ai provides an additional feature for a basic, configurable UI through which the user can issue requests and obtain predictions. This link can be found under the UI column against your model.

image3