DataVersioning¶
The data_versioning component provides commands to push and pull data from xpresso.ai.html Data Repository.
Name |
data_versioning |
Purpose |
To To push and pull data using xpresso Data Versioning Libraries |
Usage Scenarios |
|
Created By |
xpresso.ai Team |
Support e-mail |
|
Binary / Source / Both Versions |
Binary |
Docker Image Reference |
|
Type of component |
pipeline_job |
Usage Instructions |
|
Result |
Pulls the data from the repository into out-path in case of pull_dataset Pushes the data into the repository from in-path in case of push_dataset |
Example |
Assume that a dataset has to be fetched from the data repository. Create a component of type ‘pipeline_job’, and deploy it using the Custom Docker image specified above. Create a pipeline using the component. To pull the dataset, the following parameters can be specified when an experiment is run on the pipeline: Run Name: <provide a unique run name> Pipeline Version: <select the version of the pipeline you want to run> repo-name: <name of data repository> (usually, the same as the solution name) branch-name: <name of branch from which to fetch data> commit-id: <commit ID of data to be fetched> pull-input-path: <specify the path within the commit, if any> pull-output-path: <mount path on the shared drive where results are to be stored, e.g., ‘/data’> The pulled dataset will be stored here |
Deploy Solution Arguments:
Using pull_dataset
Field |
Parameter key (refer run-parameters below) |
Description |
Mandatory? |
Default Value |
Dynamic arg required? |
Comments |
-component-name |
component_name |
The component name in the solution |
Yes |
data_fetch |
Yes |
|
-command |
command |
Data versioning operation |
Yes |
None |
Yes |
Specify the value as pull_dataset |
-repo-name |
repo_name |
Name of the data versioning repository (usually, the same as the solution name) |
Yes |
None |
Yes |
|
-branch-name |
branch_name |
Name of branch within repository |
Yes |
None |
Yes |
|
-branch-type |
branch_type |
Value of the branch type |
Yes |
model |
Yes |
Can be ‘data’ for data repository operations or ‘model’ for model repository operations |
-commit-id |
commit_id |
Value of commit_id returned after pushing the data |
Yes |
Latest commit ID |
Yes |
|
-dv-commit-id |
dv_commit_id |
Value of commit_id returned by Data Versioning system |
No |
None |
Yes |
This is the commit_id returned after push_dataset before xpresso version 2.1.1. Will be deprecated in next marketplace component release |
-pull-input-path |
pull_input_path |
Path of the file on data versioning system |
No |
/dataset |
Yes |
This is returned as output of push_dataset. Helpful in fetching only required files rather than whole dataset |
-pull-output-path |
pull_output_path |
Path on the container to save the fetched data |
No |
/data/pull_data |
Yes |
This parameter can be used to save the files at required location and use it in other components. |
Using push_dataset
Field |
Parameter key (refer run-parameters below) |
Description |
Mandatory? |
Default Value |
Dynamic arg required? |
Comments |
-component-name |
component_name |
The component name in the solution |
Yes |
data_fetch |
Yes |
|
-command |
command |
Data versioning operation |
Yes |
None |
Yes |
Specify the value as push_dataset |
-push-input-path |
push_input_path |
Path of the file to be pushed |
Yes |
/data |
Yes |
|
-repo-name |
repo_name |
Name of the data versioning repository (usually, the same as the solution name) |
Yes |
None |
Yes |
|
-branch-name |
branch_name |
Name of branch within repository |
Yes |
None |
Yes |
|
-branch-type |
branch_type |
Value of the branch type |
Yes |
model |
Yes |
Can be ‘data’ for data repository operations or ‘model’ for model repository operations |
-dataset-name |
dataset_name |
Name of the dataset on data versioning system |
Yes |
None |
Yes |
|
-description |
description |
Description of the dataset |
Yes |
None |
Yes |
Whitespaces are prohibited |
For a detailed reference of data versioning parameter’s usage refer to Data Versioning Library documentation
Dynamic-args:
Specify dynamic argument right after its static argument and check the “Dynamic” checkbox. The value of this dynamic arg should be a placeholder string. This string will appear on Run Experiment form where an actual run-time value for its static argument should be filled in.
Eg: If the static argument is -out-path, then it’s dynamic arg could be out_path. This out_path will be reflected as an input field in the Run Experiment form. Value to this input field can be a string-valued path which is the expected value for -out-path arg.
Run-parameters (file or commit ID):
While loading parameters from a file or data versioning repository use mentioned keys. For more details refer Guide For Dynamically Loading Run Parameters From File Or Data Versioning Repository