Concepts

Generic Concepts

  1. Instance - refers to an xpresso.ai installation. An enterprise may have several instances installed, e.g., Dev, Staging and Production. xpresso.ai supports the promotion of instances from one installation to another

  2. Role - every xpresso.ai user can have one or more roles. The instance administrator can define new roles as well as assign actions to each role. Some pre-defined roles include:

    - DEV (Developer): can modify, build and deploy solutions, create data repository branches, push data into and pull data from repositories

    - PM (Project Manager): can create solutions in addition to tasks that a Developer can perform

    - ADMIN (Administrator): can manage users and clusters in addition to tasks that a PM can perform

    - SU(Superuser): can manage xpresso.ai components, in addition to tasks that an Admin can perform

  3. Solution - an xpresso.ai solution is the same as any software solution - a set of software components that need to be developed, built and deployed. *A solution may have multiple developers assigned to it. *

  4. Cluster - a cluster refers to a Kubernetes / Spark cluster. Each environment of a solution is allocated to a specific cluster for deployment.

  5. Component Library - the xpresso.ai Component Library is a collection of reusable components. Each component is extensively documented within xpresso.ai, with details of its objective, usage, sample parameters, Docker image reference, and links to the team which developed the component. These components can be added just like any other component (custom built by the developer) into a solution. Developers can select one of two versions of library components in their solutions - source code, or binaries. If they select the source code version, xpresso.ai auto-generates source code for their solution component by extending the library component source code. Developers can populate this code, and build it like any custom component. If they select the binary version, no source code is generated. The binaries (Docker images) of the components are used directly during solution deployment, and hence no build is required. Developers can add their own components to the Component Library (subject to checks by the xpresso.ai team), so that these components are then available across the organization. Examples of Component Library components can include a component to fetch data from the Data Versioning system, a component to load a trained model from the Model Versioning System, a component to clean data with various options, etc. Components available within the Component Library are listed here

  6. Shared Storage- when a solution is created, it is allocated some shared storage within the xpresso instance. This storage could be on NFS (for normal solutions), or on HDFS (for solutions with Spark-based components). Refer here for details

Programming Concepts

  1. Component - A component is a piece of software that achieves a certain well-defined functionality and can be run on its own (usually as a Docker image). Components combine to form a solution. Components in xpresso.ai can be of the following six types. All solutions can be considered to be combinations of these four types of components. For example, a solution may include 3 jobs, a service and 2 databases. A component can be of three different flavors - “python”, “java”, “sql” or “pyspark”. However, there are some restrictions, as indicated below

    • job - a job is a piece of software that runs, accomplishes its purpose, and then exits - examples could include a job that extracts features from data, prior to training, or the training process itself. Jobs can be of “python”, “java” or “pyspark” flavors

    • service - a service is a software component that is permanently alive. It listens for requests, returns responses, and then returns to the “wait” state. Examples could include the Inference Service of an Analytics solution. Services can be of “python” or “java” flavors

    • database - a database is a special kind of service component that handles requests against a database. Databases can be of “sql” flavor only

    • library - a library is a component required for one or more of the other components to be able to run. For example, utility code required by one or more components could be combined together into a library. Libraries can be of “python” flavor only

    • inference_service - an Inference Service is a special kind of service component that enables deployment of trained Machine Learning models. It receives inference requests, to be handled by one or more trained ML models. An Inference Service typically loads a trained model into memory and passes on inference requests to the model. The model returns a prediction for the request, which is then passed back by the service to the requester. When an inference service component is created, the developer can take advantage of certain extended functionality provided by xpresso.ai for inference services, viz., loading the trained model, transforming the input request / output response, making a prediction, etc. This extended functionality is provided by certain base classes in xpresso.ai. When the developer includes a component of type “inference_service” in her solution, the Controller automatically creates a class for the component, which extend the xpresso.ai base classes, so that the extended functionality becomes available to the developer. This class is not created for ordinary service components. Inference Services can be of “python” flavor only.

    • pipeline_job - A pipeline job is a special kind of job component that is meant to be added to a pipeline (see below). When a pipeline job component is created, the developer can take advantage of certain extended functionality provided by xpresso.ai for pipeline jobs, viz., monitoring, pausing, restarting, terminating and storing the results produced by the component. This extended functionality is provided by certain base classes in xpresso.ai. When the developer includes a component of type “pipeline_job” in her solution, the Controller automatically creates a class for the component, which extend the xpresso.ai base classes, so that the extended functionality becomes available to the developer. This class is not created for ordinary job components. Pipeline Jobs can be of “python” flavor only.

  2. Pipeline - a pipeline is a collection of jobs (i.e., components of type ”job” or “pipeline_job”) run in a particular sequence. Typically, pipelines are used to define Training Pipelines for Analytics solutions, which could include components such as “Fetch Data”, “Prepare Data”, “Extract Features”, “Train Model” and “Validate Model”. Similarly, a pipeline can be used to define an ETL Pipeline of a BI solution, with components such as “Extract Data”, “Transform Data” and “Load Data”. There is a fundamental difference in pipelines between Big Data (Spark) and non Big Data environments. For non Big Data environments, a pipeline is just a virtual container of components, and hence has no specific code associated with it. xpresso.ai takes care of running the components in the sequence specified by the developer. However, for a Spark pipeline, the developer has to write the code to run the components in sequence.

  3. Build - The xpresso.ai Build process builds the specified components of a solution (according to the build scripts included in the code), via a Jenkins pipeline. The output of the Build process is one Docker image per component. There are some differences between the build processes for pipelines on Big Data (Spark) and non Big Data environments.

    • On Big Data (Spark) clusters, since a pipeline has code associated with it, a pipeline must be built just like any component.

    • On non Big Data (Kubeflow) clusters, since a pipeline has no code associated, a pipeline need not be built.

  4. Deployment - the deployment process depends on the target environment.

    • Non Spark components are deployed to Kubernetes clusters. In this case, the docker images for all the components of the solution are sent to the allocated Kubernetes cluster for immediate execution. The cluster master node creates and runs “pods” for each component. The number of “pods” required for each component can be specified by the developer as part of the deployment instructions, based on the anticipated load on the component. Components with higher expected load should have more “pods” specified.

    • Non Spark pipelines are deployed to Kubeflow. The Docker images for all components in the pipeline are sent to Kubeflow, but the pipeline is not executed immediately. It is executed only when the developer runs an “experiment”

    • Spark components are deployed to Spark clusters, The source code (for Python components) or class files (for Java components) are sent to the allocated master node for deployment. The master node then distributes the code across the worker nodes, and runs the job immediately

    • Spark pipelines are deployed to Spark clusters in a similar manner to components, but executed only when the developer runs an experiment.

  5. Experiment - a single run of a pipeline is referred to as an experiment. Developers can run multiple experiments on their pipelines. Each experiment can report back status during its run to the Controller, which stores the status. Each experiment can also report its final results to the controller, which stores these results in a versioning system. An experiment can be in IDLE, RUNNING, PAUSED, RESTARTED, COMPLETED or TERMINATED states. Depending on its state, an experiment can be terminated, paused or restarted as well. Results of multiple experiments can be compared with each other. The primary purpose of experiments is to train various learning models, and then compare them to decide which one is best suited for the problem at hand.

Summary

  • A solution consists of components

  • Components can be of type job, service, database, library, pipeline_job or inference_service

  • Developers can define components, and code them from scratch. Alternatively, they can pick components from the xpresso.ai Component Library.

  • A Pipeline is a collection of components of type job or pipeline_job

  • Components need to be built before they can be deployed

  • Building a solution results in a Docker Image for each component, each build resulting in a different image version

  • The developer can select a subset of components for building- all solution components need not be built simultaneously

  • Components selected from the xpresso.ai Component Library do not need to be built - the Docker images for such components are pre-built and available for deployment.

  • Pipelines also do not need to be built separately - however, the constituent components of the Pipeline need to be built

  • Deploying a solution involves specifying the build version of each component being deployed, and the target environment for deployment

  • The developer can select a subset of components or collections (Pipelines / Service Meshes) for deployment - all components / collections need not be deployed simultaneously.

  • Post-deployment behaviour depends on the type of component or collection being deployed.

    • Components of type job or pipeline_job start execution immediately on deployment

    • Components of type serviceinference_service or database are ready for requests immediately on deployment

    • Component of type library cannot be executed, but are required by other components

    • Pipeline collections do not execute immediately on deployment. Developers must start experiments on deployed pipelines to execute them, providing the appropriate input parameters each time (multiple experiments can be run on the same pipeline simultaneously)

    • Service Mesh collections are ready for requests immediately on deployment

  • xpresso.ai tracks the progress of each experiment, and allows users to compare experiments with each other. This forms the basis of model versioning