Machine Learning (Kubeflow)

Solution Name: sample_project_ml

This solution demonstrates Machine Learning pipelines, Inference Services and A/B Testing.

The models built in this solution are trained to predict the future sales of a store, using sales data for a previous time period, for training and validation.

Two types of models are built using XGBoost and Neural Networks. Once the models have been trained, an Inference Service is deployed for each model, which is used to obtain predictions from the model. The Inference Services are combined to create an A/B Test.

The solution has the following components:

  • data_fetch - a component to fetch data from the data repository for the solution using the Data Versioning component from the xpresso.ai Component Library

  • xgboost_data_prep - a component of type “pipeline_job” to prepare data for training using the XGBoost library

  • xgboost_train - a component of type “pipeline_job” to train an XGBoost model using the prepared data

  • xgboost_training_pipeline - a pipeline that combines the data_fetch, xgboost_data_prep and xgboost_train components

  • xgboost_infer - a component of type “inference_service” to provide a REST API to perform predictions on input requests using the trained XGBoost model

  • dnn_data_prep - a component of type “pipeline_job” to prepare data for training using a Deep Neural Network (using the keras and Tensorflow libraries)

  • dnn_train - a component of type “pipeline_job” to train a Deep Neural Network model using the prepared data

  • dnn_training_pipeline - a pipeline that combines the data_fetch, dnn_data_prep and dnn_train components

  • dnn_infer - a component of type “inference_service” to provide a REST API to perform predictions on input requests using the trained DNN

How to use this solution

You will work on a clone of this solution. The steps to be followed are:

  1. Clone the solution. Cloning a solution does not clone its code, so you need to do this manually (Steps 2-4 below)

  2. Clone the code repository of the sample solution

    1. Navigate to the code repository of the sample solution

    2. Click “Clone” and copy the git clone command

    3. Execute the command on your machine (ensure you have Git installed)

  3. Clone the code repository of the cloned solution

    1. Navigate to the code repository of the cloned solution

    2. Click “Clone” and copy the git clone command

    3. Execute the command on your machine (ensure you have Git installed)

  4. Copy code from the sample solution into the cloned solution

  5. Commit and push the code back into the code repository of the cloned solution

    1. Execute git add -A to add the changed code to the local repository

    2. Execute git commit -m “Cloned code” to commit the code to the local repository

    3. Execute git push to push the code into Bitbucket

  6. Build the cloned solution components

    1. Select the “master” branch for each component during the build

  7. Before deploying the components and pipelines, you need to upload the parameters file to the shared drive of the solution, and the data file into the data repository

    1. Download /pipelines/dnn-training-pipeline/params.json from the NFS Drive of the original solution and upload it to the NFS Drive of the cloned solution, to the /pipelines/dnn-training-pipeline and the /pipelines/xgboost-training-pipeline folders, as the same parameters file is used by both pipelines.

    2. Download the data files (store.csv, train.csv, test.csv) from the root folder of the NFS Drive of the original solution and push the files into the data repository of the cloned solution using the xpresso.ai Data Versioning library. These files represent store information, training data and test data respectively.

      1. Navigate to the data repository for the solution

      2. Create a new branch in the data repository, called “raw_data”

      3. Upload the three data files into the branch

  8. Deploy the pipelines of the cloned solution. For both pipelines, specify the following deployment parameters for the components

    1. data_fetch (in each pipeline)

      1. Advanced Settings (Custom Docker Image) - dockerregistrysb.xpresso.ai/library/data_versioning:2.2

      2. Advanced Settings (Args) - as below

Dynamic?

Name

No

-component-name

No

data_fetch

  1. Other components

  1. Build Version = latest build version

Note that any other parameters required by any component of the pipeline will be taken from the parameters file specified when running an experiment on the deployed pipeline

9. The pipelines have now been deployed, but has not run. To run the pipeline, start an experiment using the deployed version of each pipeline. Specify the following parameters during the run:

  • Name of the pipeline - <name of the pipeline>

  • Version - latest deployed version

  • Run Name - any run name of your choice (do not use a name which you have already used)

  • Run Description - any description of your choice

  • parameters_filename - ml_params.json (this file contains values for parameters required by components of the pipeline)

10. To ensure the pipeline has run properly, view the run details. Also, ensure that each pipeline has created a model in the model repository

11. Now, you’re ready to test the inference service for each model. The inference service will accept a set of data points as input, and output the sales predicted by the model. Once an inference service has been deployed for each model, they can be combined to create an A/B Test, in which requests are randomly sent to the two inference services, and results obtained.

12. We’ll combine the deployment of the inference services and A/B Testing in a single step as follows:

  • Open the Inference Services page

  • Select both the inference services. For each inference service,

    • Select the latest successful run for the appropriate pipeline

    • Select the latest build version of the inference service

    • Set the port name to “default” and value to 5000

  • Specify any mesh name of your choice

  • Specify the weights as “50” each in the routing strategy. This indicates that 50% of the requests will go to the first model, and 50% to the second (on average)

  • Deploy the inference services

  • Note down the URL obtained as a result. To check the deployment, visit the Kubernetes dashboard for the solution

13. After the services have been deployed successfully, open a tool such as POSTMAN, and follow the test instructions. You can use the sample data below for the request payload:

{ “input”: { “Store”: 238.0, “DayOfWeek”: 5.0, “Promo”: 0.0, “StateHoliday”: 0.0, “SchoolHoliday”: 0.0, “StoreType”: 3.0, “Assortment”: 2.0, “CompetitionDistance”: 610.0, “Promo2”: 0.0, “Day”: 1.0, “Month”: 7.0, “Year”: 1.0, “isCompetition”: 0.0, “NewAssortment”: 3, “NewStoreType”: 1 } }

The response should indicate the predicted sales (in dollars), as well as the name of the model which produced the response. As mentioned above, roughly 50% of the requests should be executed by each model.

Sample Response

{
“message”: “success”,
“results”: [
4350.8134765625
],
“run_name”: “run_15”
}