Use Cases


What kind of solutions can be developed using xpresso.ai?

Data Engineering Use Cases

Use Case DE1: ETL pipeline to populate a Data Warehouse on a “normal” (non Big Data) environment

Requirements: Pull data from a remote location. Perform transformations on the data before inserting it into a data warehouse.

Solution using xpresso.ai

  • Create a new solution

  • Create a component of type “job” which uses xpresso.ai Data Connectivity libraries to fetch data from the remote data source, using python

  • Create more components of type “job” to address other transformations of your data using python or Java

  • Test your solution on your local computer, using your favourite IDE

  • Build and deploy your solution onto Kubernetes

Tip

To take advantage of xpresso.ai features of pausing, restarting, terminating and monitoring pipelines, use components of type “pipeline_job”, and instrument your code according to the Developer Guides. Remember to enable “local execution mode” when testing on your local machine


Use Case DE2: ETL pipeline for Big Data

Requirements: pull data from an HDFS source. Perform transformations on the data before inserting it back into a Hive database.

Solution using xpresso.ai

  • Create a new solution

  • Create a component of type “job” which uses xpresso.ai Data Connectivity libraries to fetch data from the HDFS file, using pyspark

  • Create more components of type “job” to address other transformations of your data, using pyspark

  • Test your solution on your local computer, using a local Spark cluster

  • Build and deploy your solution onto the xpresso.ai Spark cluster

Tip

To take advantage of xpresso.ai features of pausing, restarting, terminating and monitoring pipelines, use components of type “pipeline_job”, and instrument your code according to the Developer Guides. Remember to enable “local execution mode” when testing on your local Spark cluster


Use Case DE3: High-availability Business Intelligence (BI) web application

Requirements: Create a web application to display information sourced from a Data Warehouse

Solution using xpresso.ai

  • Create a new solution

  • Create a component of type “service” to fetch data from the Data Warehouse, using python or Java

  • Test your solution on your local computer, using your favourite IDE

  • Build and deploy your solution onto Kubernetes


Machine Learning Use Cases

Use Case ML1: Exploratory Data Analysis (EDA) of “normal” (i.e., not very large) data from different data sources

Requirements: Pull data from a remote data source, explore it, and perform some basic data cleansing

Solution using xpresso.ai

  • Open a Jupyter Notebook on the Notebook Server of the xpresso.ai instance

  • Use the Data Connectivity libraries to fetch data

  • Use the Data Exploration and Visualization Libraries to explore and visualize data - understand the data, perform univariate and multivariate analysis

  • Use the Data Cleansing library to clean the data

Tip

If you are planning to train an AI/ML model using the data, you may want to perform Target-Variable Analysis as well


Use Case ML2: Exploratory Data Analysis (EDA) of “Big Data” from HDFS

Requirements: Pull large volume of data from an HDFS data source, explore it, and perform some basic data cleansing

Solution using xpresso.ai

  • Open a Jupyter Notebook on the Notebook Server of the xpresso.ai instance

  • Use the Big Data Data Connectivity libraries to fetch data

  • Use the Data Exploration and Visualization Libraries to explore and visualize data - understand the data, perform univariate and multivariate analysis

  • Use the Data Cleansing library to clean the data

Tip

TIP: If you are planning to train an AI/ML model using the data, you may want to perform Target-Variable Analysis as well


Use Case ML3: DL / ML pipeline using sklearn / xgBoost / tensorFlow / Keras

Requirements: Build an DL/ML model using the training data provided employing one of the 4 DL/ML libraries mentioned; run training with different training algorithms / hyper-parameters; maintain model versions; compare model versions to determine the best one

Solution using xpresso.ai

  • Create a new solution

  • Create a component of type “pipeline_job” which uses xpresso.ai Data Connectivity libraries to fetch data from the data source

  • Create more components of type “pipeline_job” to perform any transformations required (e.g., data cleaning, feature extraction, etc.)

  • Create a final component of type “pipeline_job” to train the model on the data provided

  • Create a pipeline which contains the above components

  • Test your solution on your local computer, using your favourite IDE

  • Build and deploy your solution onto Kubeflow

  • Run experiments on the pipeline using the xpresso.ai Controller. Each run can have different code (e.g., feature extraction) and/or different hyper-parameters and/or different training data. Each successful run automatically creates a model version

  • Get details of each experiment (time taken, metrics, model version, etc.) using the xpresso.ai Controller

  • Compare experiments using the xpresso.ai Controller - determine the commit ID of the the best model


Use Case ML4: DL / ML pipeline using SparkML

Requirements: Build an DL/ML model using the training data provided employing the SparkML library; run training with different training algorithms / hyper-parameters; maintain model versions; compare model versions to determine the best one

Solution using xpresso.ai

  • Create a new solution

  • Create a component of type “pipeline_job” which uses xpresso.ai Data Connectivity libraries to fetch data from the data source

  • Create more components of type “pipeline_job” to perform any transformations required (e.g., data cleaning, feature extraction, etc.)

  • Create a final component of type “pipeline_job” to train the model on the data provided

  • Create a pipeline which contains the above components

  • Test your solution on your local computer, using a local Spark cluster

  • Build and deploy your solution onto the xpresso.ai Spark cluster

  • Run experiments on the pipeline using the xpresso.ai Controller. Each run can have different code (e.g., feature extraction) and/or different hyper-parameters and/or different training data. Each successful run automatically creates a model version

  • Get details of each experiment (time taken, metrics, model version, etc.) using the xpresso.ai Controller

  • Compare experiments using the xpresso.ai Controller - determine the commit ID of the the best model


Use Case ML5: Compare a “baseline” model with a “challenger” model to determine the best one

Requirements: Build two different training models and compare them

Solution using xpresso.ai

  • Create a new solution

  • Create a pipeline named “baseline” for the baseline model (refer Use Case ML3)

  • Create a pipeline named “challenger” for the challenger model (refer Use Case ML3)

  • Run experiments on both pipelines using the xpresso.ai Controller. Each run can have different code (e.g., feature extraction) and/or different hyper-parameters and/or different training data. Each successful run automatically creates a model version

  • Get details of each experiment (time taken, metrics, model version, etc.) using the xpresso.ai Controller

  • Compare experiments using the xpresso.ai Controller - determine the commit ID of the the best model


Use Case ML6: High-availability Inference Service to load trained model and make predictions

Requirements: Create an Inference Service to provide predictions from a pre-trained model in response to input data

Solution using xpresso.ai

  • Create a new solution

  • Create a component of type “inference_service”

  • The solution will have skeleton code for implementing the inference service - populate the “load_model”, “transform_request”, “predict” and “transform_response” methods as required in python

  • Test the service on your local machine, using your favourite IDE

  • Build and deploy the solution onto the xpresso.ai Kubernetes cluster


Use Case ML7: A/B Testing on two (or more) trained models via an Inference Service

Requirements: Create an Inference Service which switches between different versions of a trained model (or two or more different trained models)

Solution using xpresso.ai

  • Create a new solution

  • Create two (or more)components of type “inference_service”, one for each model to be tried

  • The solution will have skeleton code for implementing the inference service - populate the “transform_request”, “predict” and “transform_response” methods as required in python

  • Test the services on your local machine, using your favourite IDE

  • Create a service mesh which switches requests among the inference services specified

  • Build and deploy the solution onto the xpresso.ai Kubernetes cluster


Use Case ML8: Version and deploy a model trained outside xpresso.ai via an Inference Service

Requirements: One or more trained models already exist in an enterprise. These need to be stored in the xpresso.ai Model Versioning System, and deployed via the xpresso.ai Inference Service.

Solution using xpresso.ai

  • Create a new solution

  • Create a single component of type “pipeline_job” for each model to be versioned (refer Tutorial 3). In this component, read the model file(s) and store them in the “output” folder attached to the component.

  • Create a pipeline containing the component created above

  • Build and deploy the solution onto the xpresso.ai Kubernetes cluster

  • Run an experiment on the pipeline - xpresso.ai will automatically version the model file(s) present in the output folder

  • The model is now available for use in xpresso.ai Inference engines, for A/B testing, etc.


What do you want to do next?