DataExplorer¶

The data_explorer component performs exploration on the data.

Pre-requisite: Dataset object saved into a path (in-path) or pushed onto a data versioning repository.

Name	data_explorer
Purpose	To perform univariate and bivariate analysis of data
Usage Scenarios	It can be used to perform Exploratory Data Analysis (refer Data Exploration Library)
Created By	xpresso.ai Team
Support e-mail	support@xpresso.ai
Binary / Source / Both Versions	Binary
Docker Image Reference	For non-Abzooba instances: dockerregistry.xpresso.ai/library/data_explorer:2.2 For Abzooba sandbox instance: dockerregistrysb.xpresso.ai/library/data_explorer:2.2 For Abzooba QA instance: xpresso.ai/library/data_explorer:2.2 For Abzooba PROD instance: dockerregistryprod.xpresso.ai/library/data_explorer:2.2
Type of component	pipeline_job
Usage Instructions	When deploying Components : Specify Mount Path (Mount Path is a shared directory between components in a pipeline which is used for reading/writing data) Specify the Docker image referred above in the ‘Custom Docker Image’ textbox Specify arguments as specified in the ‘Deploy Solution Arguments’ section below
Result	Stores the explored dataset files and exploration excel files into the location specified by out-path. Note: For saving data into NFS specify the out-path within mount_path
Example	Assume that a dataset has to be fetched from the data repository and explored. Create a component of type ‘pipeline_job’, and deploy it using the Custom Docker image specified. above. Create a pipeline using the component To explore the dataset, the following parameters can be specified when an experiment is run on the pipeline: Run Name: <provide a unique run name> Pipeline Version: <select the version of the pipeline you want to run> bins: 5 validity-threshold: 95 repo-name: <name of data repository> (usually, the same as the solution name) branch-name: <name of branch from which to fetch data for exploration> commit-id: <commit ID of data to be fetched for exploration> out-path: <path in NFS mount where you want to store the data, e.g., ‘/data’> The explored dataset and exploration results will be stored here

Deploy Solution Arguments:

Using data from mount path

Field	Parameter key (refer run-parameters below)	Description	Mandatory?	Dynamic arg required?	Comments
-component-name	component_name	The component name in the solution	Yes	Yes
-explorer-output-path	explorer_output_path	The path where exploration results are saved	Yes	Yes	Explored dataset and exploration results are saved here
-explorer-input-path	explorer_input_path	Path of the file to load data for exploration	Yes	Yes
-validity-threshold	validity_threshold	Indicates the minimum percentage of numeric values allowed in the column	No	Yes	Applicable only to structured datasets (refer to Data Exploration Library
-bins	bins	Indicates the number of bins to be considered for the numeric probability distribution	No	Yes	Applicable only to structured datasets (refer to Data Exploration Library

Fetching data from the data versioning system

Field	Parameter key (refer run-parameters below)	Description	Mandatory?	Dynamic arg required?	Comments
-component-name	component_name	The component name in the solution	Yes	Yes
-explorer-output-path	explorer_output_path	The path where exploration results are saved	Yes	Yes	Explored dataset and exploration results are saved here
-repo-name	repo_name	Name of the data versioning repository (usually, the same as the solution name)	Yes	Yes
-branch-name	branch_name	Name of branch from which to fetch data for exploration	Yes	Yes
-commit-id	commit_id	Commit ID of data to be fetched for exploration	Yes	Yes
-validity-threshold	validity_threshold	Indicates the minimum percentage of numeric values allowed in the column	No	Yes	Applicable only to structured datasets (refer to Data Exploration Library
-bins	bins	Indicates the number of bins to be considered for the numeric probability distribution	No	Yes	Applicable only to structured datasets (refer to Data Exploration Library

Dynamic-args:

Specify dynamic argument right after its static argument and check the “Dynamic” checkbox. The value of this dynamic arg should be a placeholder string. This string will appear on Run Experiment form where an actual run-time value for its static argument should be filled in.

For example, if the static argument is -out-path, then it’s dynamic arg could be out_path. This out_path will be reflected as an input field in the Run Experiment form. Value to this input field can be a string-valued path which is the expected value for -out-path arg.

Run-parameters (file or commit ID):

While loading parameters from a file or data versioning repository use mentioned keys. For more details refer Guide For Dynamically Loading Run Parameters From File Or Data Versioning Repository