Spark ML Pipeline Component Base ClassesΒΆ


There are two basic classes which developers need to know about.

1. XprPipeline Class

The XprPipeline class represents an xpresso.ai Spark ML pipeline. It provides the following methods:

Name

Description

Parameters

constructor

initializes the pipeline

name (String) - pipeline name

spark (SparkSession) - Spark session within which to run pipeline

run_id - Experiment Run ID for pipeline

stages - pipeline stages (viz. components) - each an Estimator or Transformer - see below

fit

runs the pipeline

dataset - dataset object on which to run pipeline


2. AbstractSparkPipelineEstimator Class

The AbstractSparkPipelineEstimator class represents an Estimator in the pipeline. It is a thin wrapper around the more general AbstractPipelineComponent class. Estimators created by developers should extend any Spark ML Estimator as well as AbstractSparkPipelineEstimator

3. AbstractSparkPipelineTransformer Class

The AbstractSparkPipelineTransformer class represents a Transformer in the pipeline. It is a thin wrapper around the more general AbstractPipelineComponent class. Transformers created by developers should extend any Spark ML Transformer as well as AbstractSparkPipelineTransformer