DataExplorer¶
The data_explorer component performs exploration on the data.
Pre-requisite: Dataset object saved into a path (in-path) or pushed onto a data versioning repository.
Name |
data_explorer |
Purpose |
To perform univariate and bivariate analysis of data |
Usage Scenarios |
It can be used to perform Exploratory Data Analysis (refer Data Exploration Library) |
Created By |
xpresso.ai Team |
Support e-mail |
|
Binary / Source / Both Versions |
Binary |
Docker Image Reference |
|
Type of component |
pipeline_job |
Usage Instructions |
|
Result |
Stores the explored dataset files and exploration excel files into the location specified by out-path. Note: For saving data into NFS specify the out-path within mount_path |
Example |
Assume that a dataset has to be fetched from the data repository and explored. Create a component of type ‘pipeline_job’, and deploy it using the Custom Docker image specified. above. Create a pipeline using the component To explore the dataset, the following parameters can be specified when an experiment is run on the pipeline: Run Name: <provide a unique run name> Pipeline Version: <select the version of the pipeline you want to run> bins: 5 validity-threshold: 95 repo-name: <name of data repository> (usually, the same as the solution name) branch-name: <name of branch from which to fetch data for exploration> commit-id: <commit ID of data to be fetched for exploration> out-path: <path in NFS mount where you want to store the data, e.g., ‘/data’> The explored dataset and exploration results will be stored here |
Deploy Solution Arguments:
Using data from mount path
Field |
Parameter key (refer run-parameters below) |
Description |
Mandatory? |
Dynamic arg required? |
Comments |
-component-name |
component_name |
The component name in the solution |
Yes |
Yes |
|
-explorer-output-path |
explorer_output_path |
The path where exploration results are saved |
Yes |
Yes |
Explored dataset and exploration results are saved here |
-explorer-input-path |
explorer_input_path |
Path of the file to load data for exploration |
Yes |
Yes |
|
-validity-threshold |
validity_threshold |
Indicates the minimum percentage of numeric values allowed in the column |
No |
Yes |
Applicable only to structured datasets (refer to Data Exploration Library |
-bins |
bins |
Indicates the number of bins to be considered for the numeric probability distribution |
No |
Yes |
Applicable only to structured datasets (refer to Data Exploration Library |
Fetching data from the data versioning system
Field |
Parameter key (refer run-parameters below) |
Description |
Mandatory? |
Dynamic arg required? |
Comments |
-component-name |
component_name |
The component name in the solution |
Yes |
Yes |
|
-explorer-output-path |
explorer_output_path |
The path where exploration results are saved |
Yes |
Yes |
Explored dataset and exploration results are saved here |
-repo-name |
repo_name |
Name of the data versioning repository (usually, the same as the solution name) |
Yes |
Yes |
|
-branch-name |
branch_name |
Name of branch from which to fetch data for exploration |
Yes |
Yes |
|
-commit-id |
commit_id |
Commit ID of data to be fetched for exploration |
Yes |
Yes |
|
-validity-threshold |
validity_threshold |
Indicates the minimum percentage of numeric values allowed in the column |
No |
Yes |
Applicable only to structured datasets (refer to Data Exploration Library |
-bins |
bins |
Indicates the number of bins to be considered for the numeric probability distribution |
No |
Yes |
Applicable only to structured datasets (refer to Data Exploration Library |
Dynamic-args:
Specify dynamic argument right after its static argument and check the “Dynamic” checkbox. The value of this dynamic arg should be a placeholder string. This string will appear on Run Experiment form where an actual run-time value for its static argument should be filled in.
For example, if the static argument is -out-path, then it’s dynamic arg could be out_path. This out_path will be reflected as an input field in the Run Experiment form. Value to this input field can be a string-valued path which is the expected value for -out-path arg.
Run-parameters (file or commit ID):
While loading parameters from a file or data versioning repository use mentioned keys. For more details refer Guide For Dynamically Loading Run Parameters From File Or Data Versioning Repository