Machine Learning (Kubeflow)¶
Solution Name: sample_project_ml
This solution demonstrates Machine Learning pipelines, Inference Services and A/B Testing.
The models built in this solution are trained to predict the future sales of a store, using sales data for a previous time period, for training and validation.
Two types of models are built using XGBoost and Neural Networks. Once the models have been trained, an Inference Service is deployed for each model, which is used to obtain predictions from the model. The Inference Services are combined to create an A/B Test.
The solution has the following components:
data_fetch - a component to fetch data from the data repository for the solution using the Data Versioning component from the xpresso.ai Component Library
xgboost_data_prep - a component of type “pipeline_job” to prepare data for training using the XGBoost library
xgboost_train - a component of type “pipeline_job” to train an XGBoost model using the prepared data
xgboost_training_pipeline - a pipeline that combines the data_fetch, xgboost_data_prep and xgboost_train components
xgboost_infer - a component of type “inference_service” to provide a REST API to perform predictions on input requests using the trained XGBoost model
dnn_data_prep - a component of type “pipeline_job” to prepare data for training using a Deep Neural Network (using the keras and Tensorflow libraries)
dnn_train - a component of type “pipeline_job” to train a Deep Neural Network model using the prepared data
dnn_training_pipeline - a pipeline that combines the data_fetch, dnn_data_prep and dnn_train components
dnn_infer - a component of type “inference_service” to provide a REST API to perform predictions on input requests using the trained DNN
How to use this solution
You will work on a clone of this solution. The steps to be followed are:
Clone the solution. Cloning a solution does not clone its code, so you need to do this manually (Steps 2-4 below)
Clone the code repository of the sample solution
Click “Clone” and copy the git clone command
Execute the command on your machine (ensure you have Git installed)
Clone the code repository of the cloned solution
Click “Clone” and copy the git clone command
Execute the command on your machine (ensure you have Git installed)
Copy code from the sample solution into the cloned solution
Commit and push the code back into the code repository of the cloned solution
Execute git add -A to add the changed code to the local repository
Execute git commit -m “Cloned code” to commit the code to the local repository
Execute git push to push the code into Bitbucket
Build the cloned solution components
Select the “master” branch for each component during the build
Before deploying the components and pipelines, you need to upload the parameters file to the shared drive of the solution, and the data file into the data repository
Download /pipelines/dnn-training-pipeline/params.json from the NFS Drive of the original solution and upload it to the NFS Drive of the cloned solution, to the /pipelines/dnn-training-pipeline and the /pipelines/xgboost-training-pipeline folders, as the same parameters file is used by both pipelines.
Download the data files (store.csv, train.csv, test.csv) from the root folder of the NFS Drive of the original solution and push the files into the data repository of the cloned solution using the xpresso.ai Data Versioning library. These files represent store information, training data and test data respectively.
Create a new branch in the data repository, called “raw_data”
Upload the three data files into the branch
Deploy the pipelines of the cloned solution. For both pipelines, specify the following deployment parameters for the components
data_fetch (in each pipeline)
Advanced Settings (Custom Docker Image) - dockerregistrysb.xpresso.ai/library/data_versioning:2.2
Advanced Settings (Args) - as below
Dynamic? |
Name |
---|---|
No |
-component-name |
No |
data_fetch |
Other components
Build Version = latest build version
Note that any other parameters required by any component of the pipeline will be taken from the parameters file specified when running an experiment on the deployed pipeline
9. The pipelines have now been deployed, but has not run. To run the pipeline, start an experiment using the deployed version of each pipeline. Specify the following parameters during the run:
Name of the pipeline - <name of the pipeline>
Version - latest deployed version
Run Name - any run name of your choice (do not use a name which you have already used)
Run Description - any description of your choice
parameters_filename - ml_params.json (this file contains values for parameters required by components of the pipeline)
10. To ensure the pipeline has run properly, view the run details. Also, ensure that each pipeline has created a model in the model repository
11. Now, you’re ready to test the inference service for each model. The inference service will accept a set of data points as input, and output the sales predicted by the model. Once an inference service has been deployed for each model, they can be combined to create an A/B Test, in which requests are randomly sent to the two inference services, and results obtained.
12. We’ll combine the deployment of the inference services and A/B Testing in a single step as follows:
Open the Inference Services page
Select both the inference services. For each inference service,
Select the latest successful run for the appropriate pipeline
Select the latest build version of the inference service
Set the port name to “default” and value to 5000
Specify any mesh name of your choice
Specify the weights as “50” each in the routing strategy. This indicates that 50% of the requests will go to the first model, and 50% to the second (on average)
Deploy the inference services
Note down the URL obtained as a result. To check the deployment, visit the Kubernetes dashboard for the solution
13. After the services have been deployed successfully, open a tool such as POSTMAN, and follow the test instructions. You can use the sample data below for the request payload:
{ “input”: { “Store”: 238.0, “DayOfWeek”: 5.0, “Promo”: 0.0, “StateHoliday”: 0.0, “SchoolHoliday”: 0.0, “StoreType”: 3.0, “Assortment”: 2.0, “CompetitionDistance”: 610.0, “Promo2”: 0.0, “Day”: 1.0, “Month”: 7.0, “Year”: 1.0, “isCompetition”: 0.0, “NewAssortment”: 3, “NewStoreType”: 1 } }
The response should indicate the predicted sales (in dollars), as well as the name of the model which produced the response. As mentioned above, roughly 50% of the requests should be executed by each model.
Sample Response