PySpark Pipelines¶
Build Guide
The build of any PySpark pipeline within an xpresso.ai solution is according to the Jenkins Build Pipeline defined for the solution. This, in turn, is governed by the order of stages specified in the Jenkinsfile configuration file, located in the xprbuild folder for each PySpark pipeline.
Stages in Jenkins Build Pipeline for PySpark Pipeline
S. No. |
Stage |
Description |
Steps |
Checkout |
checks out source code from the code repository and cleans the target folder |
|
|
Prepare |
prepares the build environment |
Calls make prepare using the Makefile located at <component root>/xprbuild - this calls <component root>/xprbuild/system/linux/pre_build.sh |
|
Build |
Builds the Docker image for the component |
Calls make build using the Makefile located at <component root>/xprbuild - this calls <component root>/xprbuild/docker/build.sh |
|
Test |
Tests the new Docker image |
Calls make unittest using the Makefile located at <component root>/xprbuild - this calls <component root>/xprbuild/docker/test.sh |
|
Docker Push |
Pushes the new Docker image into the xpresso.ai Docker registry |
Calls make dockerpush using the Makefile located at <component root>/xprbuild - this pushes the new Docker image into the registry |
Repository Folder Structure for PySpark pipelines
The folder structure for any PySpark pipelines is described in detail below:
Folder |
File |
Description |
Developer Tips |
/ |
CHANGELOG.md |
Stores a log of changes to the component |
Document changes to the component in this file |
/ |
Makefile |
Makes the solution (see above for details) |
Changes will usually not be required to this file. However, it is a good idea to review the actions being performed on various make rules, especially clobber, prepare, build and dockerpush |
/ |
README.md |
Describes the pipeline |
Write a brief description of the pipeline and the source files required by it in this file |
/ |
VERSION |
Stores the pipeline version number |
Write the pipeline version number here |
/app |
__init__.py |
Dummy source code |
|
/app |
.gitignore |
Dummy .gitignore file |
Populate this as per need to ignore file from git file tracking |
/app |
app.py |
Dummy source code |
app.py is the default entry point for the pyspark pipeline |
/requirements |
requirements.txt |
Contains list of libraries required to be installed for proper functioning of the pipeline |
Libraries will be installed as part of the build stage of the Jenkins pipeline (see above) |
/xprbuild |
Jenkinsfile |
Stores the actions performed by the Jenkins pipeline for the component |
See above for details. Review, but do not make changes to this file. Make changes to scripts being called by the pipeline if required |
/xprbuild/docker |
Dockerfile |
Stores commands processed when building the Docker image for the component |
Default actions: Call <component root>/xprbuild//system/linux/pre_build.sh Call <component root>/xprbuild//system/linux/build.sh Call <component root>/xprbuild//system/linux/post_build.sh Call <component root>/xprbuild//system/linux/run.sh Change this file as per the component requirements. See Docker documentation for details |
/xprbuild/docker |
build.sh |
Called during the Build stage of the Jenkins Build Pipeline |
Builds the Docker image as per the instructions in Dockerfile by default. Change as per pipeline requirements |
/xprbuild/docker |
pre-build.sh |
Unused |
|
/xprbuild/docker |
test.sh |
Called during the Test stage of the Jenkins Build Pipeline |
Executes pytest on the new Docker image by default. Change as per pipeline requirements |
/xprbuild/system |
Makefile |
Unused |
|
/xprbuild/system/linux |
build.sh |
Called during the Build stage of the Jenkins Build Pipeline |
Installs requirements mentioned in <pipeline root>/requirements/requirements.txt by default. Change as per pipeline requirements |
/xprbuild/system/linux |
post-build.sh |
Called during the Build stage of the Jenkins Build Pipeline |
Does nothing by default. Change as per pipeline requirements |
/xprbuild/system/linux |
pre-build.sh |
Called during the Prepare and Build stage of the Jenkins Build Pipeline |
Installs python and pytest by default. Change as per pipeline requirements |
/xprbuild/system/linux |
run.sh |
Called during the Build stage of the Jenkins Build Pipeline |
Runs the code in <pipeline root>/app/app.py by default. Change as per pipeline requirements |
/xprbuild/system/linux |
spark-submit.sh |
Called when spark pipeiline needs to be submitted to the cluster i.e. deployment |
Submits the spark pipeline on cluster which starts running from app/main.py |
/xprbuild/system/linux |
test.sh |
Unused |
|
/xprbuild/system/windows |
Makefile |
Unused |
May be required in future to support Windows deployment |
/xprbuild/system/windows |
build.bat |
Unused |
May be required in future to support Windows deployment |
/xprbuild/system/windows |
post-build.bat |
Unused |
May be required in future to support Windows deployment |
/xprbuild/system/windows |
pre-build.bat |
Unused |
May be required in future to support Windows deployment |
/xprbuild/system/windows |
run.bat |
Unused |
May be required in future to support Windows deployment |
/xprbuild/system/windows |
test.bat |
Unused |
May be required in future to support Windows deployment |