PySpark Components¶
Build Guide
The build of any component within an xpresso.ai solution is according to the Jenkins Build Pipeline defined for the solution. This, in turn, is governed by the order of stages specified in the Jenkinsfile configuration file, located in the xprbuild folder for each component.
Stages in Jenkins Build Pipeline for PySpark Components
S. No. |
Stage |
Description |
Steps |
Checkout |
checks out source code from the code repository and cleans the target folder |
|
|
Prepare |
prepares the build environment |
Calls make prepare using the Makefile located at <component root>/xprbuild - this calls <component root>/xprbuild/system/linux/pre_build.sh |
|
Build |
Builds the Docker image for the component |
Calls make build using the Makefile located at <component root>/xprbuild - this calls <component root>/xprbuild/docker/build.sh |
|
Test |
Tests the new Docker image |
Calls make unittest using the Makefile located at <component root>/xprbuild - this calls <component root>/xprbuild/docker/test.sh |
|
Docker Push |
Pushes the new Docker image into the xpresso.ai Docker registry |
Calls make dockerpush using the Makefile located at <component root>/xprbuild - this pushes the new Docker image into the registry |
Repository Folder Structure for PySpark Components
The folder structure for any PySpark job component is described in detail below:
Folder |
File |
Description |
Developer Tips |
/ |
CHANGELOG.md |
Stores a log of changes to the component |
Document changes to the component in this file |
/ |
Makefile |
Makes the solution (see above for details) |
Changes will usually not be required to this file. However, it is a good idea to review the actions being performed on various make rules, especially clobber, prepare, build and dockerpush |
/ |
README.md |
Describes the component |
Write a brief description of the component and the source files required by it in this file |
/ |
VERSION |
Stores the component version number |
Write the component version number here |
/app |
__init__.py |
Dummy source code |
Store all source code for the component in this folder in this folder (with sub-folders as required) |
/app |
app.py |
Dummy source code |
app.py is the default entry point for the component |
/config |
dev.json |
Dummy configuration file (dev environment) |
Store configuration files in this folder. Best practice to have separate configuration files for different environments |
/config |
prod.json |
Dummy configuration file (prod environment) |
Store configuration files in this folder. Best practice to have separate configuration files for different environments |
/config |
stage.json |
Dummy configuration file (stage environment) |
Store configuration files in this folder. Best practice to have separate configuration files for different environments |
/data |
sample_empty_data.txt |
Dummy data file |
Store all the data files required for the component in the /data folder (with sub-folders as required) |
/requirements |
requirements.txt |
Contains list of libraries required to be installed for proper functioning of the component |
Libraries will be installed as part of the build stage of the Jenkins pipeline (see above) |
/scripts |
sample.sh |
Dummy script |
Store any scripts required by the component in this folder (with sub-folders as required) |
/tests |
__init__.py |
Dummy test file |
Store unit test code in this folder (and sub-folders as required). These will be run automatically using pytest as part of the unittest stage of the Jenkins pipeline (see above) |
/tests/app |
init.py |
Dummy test file |
|
/tests/app |
test_app.py |
Dummy test file |
|
/xprbuild |
Jenkinsfile |
Stores the actions performed by the Jenkins pipeline for the component |
See above for details. Review, but do not make changes to this file. Make changes to scripts being called by the pipeline if required |
/xprbuild/docker |
Dockerfile |
Stores commands processed when building the Docker image for the component |
Default actions: Call <component root>/xprbuild//system/linux/pre_build.sh Call <component root>/xprbuild//system/linux/build.sh Call <component root>/xprbuild//system/linux/post_build.sh Call <component root>/xprbuild//system/linux/run.sh Change this file as per the component requirements. See Docker documentation for details |
/xprbuild/docker |
build.sh |
Called during the Build stage of the Jenkins Build Pipeline |
Builds the Docker image as per the instructions in Dockerfile by default. Change as per component requirements |
/xprbuild/docker |
pre-build.sh |
Unused |
|
/xprbuild/docker |
test.sh |
Called during the Test stage of the Jenkins Build Pipeline |
Executes pytest on the new Docker image by default. Change as per component requirements |
/xprbuild/system |
Makefile |
Unused |
|
/xprbuild/system/linux |
build.sh |
Called during the Build stage of the Jenkins Build Pipeline |
Installs requirements mentioned in <component root>/requirements/requirements.txt by default. Change as per component requirements |
/xprbuild/system/linux |
post-build.sh |
Called during the Build stage of the Jenkins Build Pipeline |
Does nothing by default. Change as per component requirements |
/xprbuild/system/linux |
pre-build.sh |
Called during the Prepare and Build stage of the Jenkins Build Pipeline |
Installs python and pytest by default. Change as per component requirements |
/xprbuild/system/linux |
run.sh |
Called during the Build stage of the Jenkins Build Pipeline |
Runs the code in <component root>/app/app.py by default. Change as per component requirements |
/xprbuild/system/linux |
spark-submit.sh |
Called when spark job needs to be submitted to the cluster i.e. deployment |
Submits the spark job on cluster which starts running from app/main.py |
/xprbuild/system/linux |
test.sh |
Unused |
|
/xprbuild/system/windows |
Makefile |
Unused |
May be required in future to support Windows deployment |
/xprbuild/system/windows |
build.bat |
Unused |
May be required in future to support Windows deployment |
/xprbuild/system/windows |
post-build.bat |
Unused |
May be required in future to support Windows deployment |
/xprbuild/system/windows |
pre-build.bat |
Unused |
May be required in future to support Windows deployment |
/xprbuild/system/windows |
run.bat |
Unused |
May be required in future to support Windows deployment |
/xprbuild/system/windows |
test.bat |
Unused |
May be required in future to support Windows deployment |