Use Cases¶

What kind of solutions can be developed using xpresso.ai?

Data Engineering Use Cases¶

Use Case DE1: ETL pipeline to populate a Data Warehouse on a “normal” (non Big Data) environment

Requirements: Pull data from a remote location. Perform transformations on the data before inserting it into a data warehouse.

Solution using xpresso.ai

Create a new solution
Create a component of type “job” which uses xpresso.ai Data Connectivity libraries to fetch data from the remote data source, using python
Create more components of type “job” to address other transformations of your data using python or Java
Test your solution on your local computer, using your favourite IDE
Build and deploy your solution onto Kubernetes

Tip

To take advantage of xpresso.ai features of pausing, restarting, terminating and monitoring pipelines, use components of type “pipeline_job”, and instrument your code according to the Developer Guides. Remember to enable “local execution mode” when testing on your local machine

Use Case DE2: ETL pipeline for Big Data

Requirements: pull data from an HDFS source. Perform transformations on the data before inserting it back into a Hive database.

Solution using xpresso.ai

Create a new solution
Create a component of type “job” which uses xpresso.ai Data Connectivity libraries to fetch data from the HDFS file, using pyspark
Create more components of type “job” to address other transformations of your data, using pyspark
Test your solution on your local computer, using a local Spark cluster
Build and deploy your solution onto the xpresso.ai Spark cluster

Tip

To take advantage of xpresso.ai features of pausing, restarting, terminating and monitoring pipelines, use components of type “pipeline_job”, and instrument your code according to the Developer Guides. Remember to enable “local execution mode” when testing on your local Spark cluster

Use Case DE3: High-availability Business Intelligence (BI) web application

Requirements: Create a web application to display information sourced from a Data Warehouse

Solution using xpresso.ai

Create a new solution
Create a component of type “service” to fetch data from the Data Warehouse, using python or Java
Test your solution on your local computer, using your favourite IDE
Build and deploy your solution onto Kubernetes

Machine Learning Use Cases¶

Use Case ML1: Exploratory Data Analysis (EDA) of “normal” (i.e., not very large) data from different data sources

Requirements: Pull data from a remote data source, explore it, and perform some basic data cleansing

Solution using xpresso.ai

Open a Jupyter Notebook on the Notebook Server of the xpresso.ai instance
Use the Data Connectivity libraries to fetch data
Use the Data Exploration and Visualization Libraries to explore and visualize data - understand the data, perform univariate and multivariate analysis
Use the Data Cleansing library to clean the data

Tip

If you are planning to train an AI/ML model using the data, you may want to perform Target-Variable Analysis as well

Use Case ML2: Exploratory Data Analysis (EDA) of “Big Data” from HDFS

Requirements: Pull large volume of data from an HDFS data source, explore it, and perform some basic data cleansing

Solution using xpresso.ai

Open a Jupyter Notebook on the Notebook Server of the xpresso.ai instance
Use the Big Data Data Connectivity libraries to fetch data
Use the Data Exploration and Visualization Libraries to explore and visualize data - understand the data, perform univariate and multivariate analysis
Use the Data Cleansing library to clean the data

Tip

TIP: If you are planning to train an AI/ML model using the data, you may want to perform Target-Variable Analysis as well

Use Case ML3: DL / ML pipeline using sklearn / xgBoost / tensorFlow / Keras

Requirements: Build an DL/ML model using the training data provided employing one of the 4 DL/ML libraries mentioned; run training with different training algorithms / hyper-parameters; maintain model versions; compare model versions to determine the best one

Solution using xpresso.ai

Create a new solution
Create a component of type “pipeline_job” which uses xpresso.ai Data Connectivity libraries to fetch data from the data source
Create more components of type “pipeline_job” to perform any transformations required (e.g., data cleaning, feature extraction, etc.)
Create a final component of type “pipeline_job” to train the model on the data provided
Create a pipeline which contains the above components
Test your solution on your local computer, using your favourite IDE
Build and deploy your solution onto Kubeflow
Run experiments on the pipeline using the xpresso.ai Controller. Each run can have different code (e.g., feature extraction) and/or different hyper-parameters and/or different training data. Each successful run automatically creates a model version
Get details of each experiment (time taken, metrics, model version, etc.) using the xpresso.ai Controller
Compare experiments using the xpresso.ai Controller - determine the commit ID of the the best model

Use Case ML4: DL / ML pipeline using SparkML

Requirements: Build an DL/ML model using the training data provided employing the SparkML library; run training with different training algorithms / hyper-parameters; maintain model versions; compare model versions to determine the best one

Solution using xpresso.ai

Create a new solution
Create a component of type “pipeline_job” which uses xpresso.ai Data Connectivity libraries to fetch data from the data source
Create more components of type “pipeline_job” to perform any transformations required (e.g., data cleaning, feature extraction, etc.)
Create a final component of type “pipeline_job” to train the model on the data provided
Create a pipeline which contains the above components
Test your solution on your local computer, using a local Spark cluster
Build and deploy your solution onto the xpresso.ai Spark cluster
Run experiments on the pipeline using the xpresso.ai Controller. Each run can have different code (e.g., feature extraction) and/or different hyper-parameters and/or different training data. Each successful run automatically creates a model version
Get details of each experiment (time taken, metrics, model version, etc.) using the xpresso.ai Controller
Compare experiments using the xpresso.ai Controller - determine the commit ID of the the best model

Use Case ML5: Compare a “baseline” model with a “challenger” model to determine the best one

Requirements: Build two different training models and compare them

Solution using xpresso.ai

Create a new solution
Create a pipeline named “baseline” for the baseline model (refer Use Case ML3)
Create a pipeline named “challenger” for the challenger model (refer Use Case ML3)
Run experiments on both pipelines using the xpresso.ai Controller. Each run can have different code (e.g., feature extraction) and/or different hyper-parameters and/or different training data. Each successful run automatically creates a model version
Get details of each experiment (time taken, metrics, model version, etc.) using the xpresso.ai Controller
Compare experiments using the xpresso.ai Controller - determine the commit ID of the the best model

Use Case ML6: High-availability Inference Service to load trained model and make predictions

Requirements: Create an Inference Service to provide predictions from a pre-trained model in response to input data

Solution using xpresso.ai

Create a new solution
Create a component of type “inference_service”
The solution will have skeleton code for implementing the inference service - populate the “load_model”, “transform_request”, “predict” and “transform_response” methods as required in python
Test the service on your local machine, using your favourite IDE
Build and deploy the solution onto the xpresso.ai Kubernetes cluster

Use Case ML7: A/B Testing on two (or more) trained models via an Inference Service

Requirements: Create an Inference Service which switches between different versions of a trained model (or two or more different trained models)

Solution using xpresso.ai

Create a new solution
Create two (or more)components of type “inference_service”, one for each model to be tried
The solution will have skeleton code for implementing the inference service - populate the “transform_request”, “predict” and “transform_response” methods as required in python
Test the services on your local machine, using your favourite IDE
Create a service mesh which switches requests among the inference services specified
Build and deploy the solution onto the xpresso.ai Kubernetes cluster

Use Case ML8: Version and deploy a model trained outside xpresso.ai via an Inference Service

Requirements: One or more trained models already exist in an enterprise. These need to be stored in the xpresso.ai Model Versioning System, and deployed via the xpresso.ai Inference Service.

Solution using xpresso.ai

Create a new solution
Create a single component of type “pipeline_job” for each model to be versioned (refer Tutorial 3). In this component, read the model file(s) and store them in the “output” folder attached to the component.
Create a pipeline containing the component created above
Build and deploy the solution onto the xpresso.ai Kubernetes cluster
Run an experiment on the pipeline - xpresso.ai will automatically version the model file(s) present in the output folder
The model is now available for use in xpresso.ai Inference engines, for A/B testing, etc.

What do you want to do next?

Read the Quick Start Guide to get started using xpresso.ai
Read about some xpresso.ai Concepts