Concepts


Generic Concepts


  1. Workspace - refers to an xpresso.ai instance. An enterprise may have several workspaces installed. For example, Abzooba has four active workspaces :

    1. Dev: for the use of the xpresso.ai development team

    2. QA: for the use of the xpresso.ai QA team

    3. Sandbox: to be used by individual developers forexperimentation, and

    4.  Production: for deploymentof customer solutions, POCs and demos.

2. Role - every xpresso.ai user can have one or more roles. Currently supported roles include:

- DEV (Developer): can modify, build and deploy solutions, create data repository branches, push data into and pull data from repositories

- PM (Project Manager): can create solutions in addition to tasks that a Developer can perform

- ADMIN (Administrator): can manage users and clusters in addition to tasks that a PM can perform

**- SU **(Superuser): can manage xpresso.ai components, in addition to tasks that an Admin can perform

3. Solution - an xpresso.ai solution is the same as any software solution - a set of software components that need to be developed, built and deployed. *A solution may have multiple developers assigned to it. *

4. Environment - refers to the target for deploying a solution. xpresso.ai supports the following environments for each solution (in increasing “order”)- “DEV”, (Development), “INT” (Integration), “QA”, “UAT” (User Acceptance Testing) and “PROD” (Production). When a solution is created, it can specify one or more environments required as deployment targets. Solution builds can be deployed to any of these environments, subject to the constraint that the version of the build requested for deployment to a specific environment should have been deployed to all available “lower order” environments earlier. For example, if a solution has “DEV”, “QA” and “PROD” environments defined, a version of the build cannot be deployed to “QA” without having been deployed to “DEV” earlier, and cannot be deployed to “PROD” without having been deployed to both “DEV” and “QA” earlier. Environments can be created on different types of target clusters. For example, a solution may have DEV and QA environments defined on both Kubernetes as well as Spark clusters.

5. Cluster - a cluster refers to a Kubernetes / Spark cluster. Each environment of a solution is allocated to a specific cluster for deployment.

6. Marketplace - the xpresso.ai Marketplace is a collection of reusable components. Each component is extensively documented within xpresso.ai, with details of its objective, usage, sample parameters, Docker image reference, and links to the team which developed the component. These components can be added just like any other component (custom built by the developer) into a solution. However, they do not need to be built. They can be added to a solution deployment by specifying the version of the Docker image for the component. The xpresso.ai Controller pulls this Docker image from the Docker repository, and adds it to the deployment, just like any other component in the solution. At present, the Marketplace includes only components developed by the xpresso.ai team. But in future, developers will be able to add their own components to the Marketplace (subject to checks by the xpresso.ai team), so that these components are then available across the organization. Examples of Marketplace components can include a component to fetch data from the Data Versioning system, a component to load a trained model from the Model Versioning System, a component to clean data with various options, etc. Components available within the Marketplace are listed here

7. **Shared Storage **- when a solution is created, it is allocated some shared storage within the xpresso instance. This storage could be on NFS (for normal solutions), or on HDFS (for solutions with Spark-based components). Refer

here for details


Programming Concepts

  1. Component - A component is a piece of software that achieves a certain well-defined functionality and can be run on its own (usually as a Docker image). Components combine to form a solution. Components in xpresso.ai can be of the following six types. All solutions can be considered to be combinations of these four types of components. For example, a solution may include 3 jobs, a service and 2 databases. A component can be of three different flavors - “python”, “java”, “sql” or “pyspark”. However, there are some restrictions, as indicated below

    1. job - a job is a piece of software that runs, accomplishes its purpose, and then exits - examples could include a job that extracts features from data, prior to training, or the training process itself. Jobs can be of “python”, “java” or “pyspark” flavors

    2. service - a service is a software component that is permanently alive. It listens for requests, returns responses, and then returns to the “wait” state. Examples could include the Inference Service of an Analytics solution. Services can be of “python” or “java” flavors

    3. database - a database is a special kind of service component that handles requests against a database. Databases can be of “sql” flavor only

    4. library - a library is a component required for one or more of the other components to be able to run. For example, utility code required by one or more components could be combined together into a library. Libraries can be of “python” flavor only

    5. inference_service - an Inference Service is a special kind of service component that enables deployment of trained Machine Learning models. It receives inference requests, to be handled by one or more trained ML models. An Inference Service typically loads a trained model into memory and passes on inference requests to the model. The model returns a prediction for the request, which is then passed back by the service to the requester. When an inference service component is created, the developer can take advantage of certain extended functionality provided by xpresso.ai for inference services, viz., loading the trained model, transforming the input request / output response, making a prediction, etc. This extended functionality is provided by certain base classes in xpresso.ai. When the developer includes a component of type “inference_service” in her solution, the Controller automatically creates a class for the component, which extend the xpresso.ai base classes, so that the extended functionality becomes available to the developer. **This class is not created for ordinary service components. **Inference Services can be of “python” flavor only

    6. pipeline_job - A pipeline job is a special kind of **job **component that is meant to be added to a pipeline (see below). When a pipeline job component is created, the developer can take advantage of certain extended functionality provided by xpresso.ai for pipeline jobs, viz., monitoring, pausing, restarting, terminating and storing the results produced by the component. This extended functionality is provided by certain base classes in xpresso.ai. When the developer includes a component of type “pipeline_job” in her solution, the Controller automatically creates a class for the component, which extend the xpresso.ai base classes, so that the extended functionality becomes available to the developer. **This class is not created for ordinary job components. **Pipeline Jobs can be of “python” flavor only.

  2. Pipeline - a pipeline is a collection of jobs (i.e., components of type “job” **or **”pipeline_job”) run in a particular sequence. Typically, pipelines are used to define Training Pipelines for Analytics solutions, which could include components such as “Fetch Data”, “Prepare Data”, “Extract Features”, “Train Model” and “Validate Model”. Similarly, a pipeline can be used to define an ETL Pipeline of a BI solution, with components such as “Extract Data”, “Transform Data” and “Load Data”. There is a fundamental difference in pipelines between Big Data (Spark) and non Big Data environments. For non Big Data environments, a pipeline is just a virtual container of components, and hence has no specific code associated with it. xpresso.ai takes care of running the components in the sequence specified by the developer. However, for a Spark pipeline, the developer has to write the code to run the components in sequence.

  3. Build - The xpresso.ai Build process builds the specified components of a solution (according to the build scripts included in the code), via a Jenkins pipeline. The output of the Build process is one Docker image per component. There are some differences between the build processes for pipelines on Big Data (Spark) and non Big Data environments.

  • On Big Data environments, since a pipeline has code associated with it, a pipeline must be built just like any component.

  • On non Big Data environments, since a pipeline has no code associated, a pipeline need not be built.

  1. Deployment - the deployment process depends on the target environment.

  • Non Spark components are deployed to Kubernetes clusters. In this case, the docker images for all the components of the solution are sent to the allocated Kubernetes cluster for immediate execution. The cluster master node creates and runs “pods” for each component. The number of “pods” required for each component can be specified by the developer as part of the deployment instructions, based on the anticipated load on the component. Components with higher expected load should have more “pods” specified.

  • Non Spark pipelines are deployed to Kubeflow. The Docker images for all components in the pipeline are sent to Kubeflow, but the pipeline is not executed immediately. It is executed only when the developer runs an “experiment”

  • Spark components are deployed to Spark clusters, The source code (for Python components) or class files (for Java components) are sent to the allocated master node for deployment. The master node then distributes the code across the worker nodes, and runs the job immediately

  • Spark pipelines are deployed to Spark clusters in a similar manner to components, but executed only when the developer runs an experiment.

6. Experiment - a single run of a pipeline is referred to as an experiment. Developers can run multiple experiments on their pipelines. Each experiment can report back status during its run to the Controller, which stores the status. Each experiment can also report its final results to the controller, which stores these results in a versioning system. An experiment can be in IDLE, RUNNING, PAUSED, RESTARTED, COMPLETED or TERMINATED states. Depending on its state, an experiment can be terminated, paused or restarted as well. Results of multiple experiments can be compared with each other. The primary purpose of experiments is to train various learning models, and then compare them to decide which one is best suited for the problem at hand.


Summary

  • A solution consists of components

  • Components can be of type job, service, database, library, pipeline_job or inference_service

  • Developers can define components, and code them from scratch. Alternatively, they can pick components from the xpresso.ai Marketplace.

  • A Pipeline is a collection of components of type job or pipeline_job

  • Components need to be built before they can be deployed

  • Building a solution results in a Docker Image for each component, each build resulting in a different image version

  • The developer can select a subset of components for building- all solution components need not be built simultaneously

  • Components selected from the xpresso.ai Marketplace do not need to be built - the Docker images for such components are pre-built and available for deployment.

  • Pipelines also do not need to be built separately - however, the constituent components of the Pipeline need to be built

  • Deploying a solution involves specifying the build version of each component being deployed, and the target environment for deployment

  • The developer can select a subset of components or collections (Pipelines / Service Meshes) for deployment - all components / collections need not be deployed simultaneously.

  • Post-deployment behaviour depends on the type of component or collection being deployed.

    • Components of type job or *pipeline_job *start execution immediately on deployment

    • Components of type* service*, inference_service or database are ready for requests immediately on deployment

    • Component of type library cannot be executed, but are required by other components

    • Pipeline collections do not execute immediately on deployment. Developers must start experiments on deployed pipelines to execute them, providing the appropriate input parameters each time (multiple experiments can be run on the same *pipeline *simultaneously)

    • Service Mesh collections are ready for requests immediately on deployment

  • xpresso.ai tracks the progress of each experiment, and allows users to compare experiments with each other. This forms the basis of model versioning


Example

image4

  • The solution has 6 components - 2 jobs, 1 pipeline_job, *1 *service, 1 library and 1 inference_service

  • The job “Job 2” and the pipeline_job “Pipeline Job 1” have been combined to form a pipeline called “Pipeline 1”

  • The service Service 1” and the inference_service “Inference Service 1” have been combined to form the Service Mesh “Service Mesh 1”

.