DataConnector¶
The data_connector component fetches data from different sources into a dataset and saves it to a specified out path.
Name |
data_connector |
Purpose |
To fetch data from different sources using xpresso data source connectivity |
Usage Scenarios |
|
Created By |
xpresso.ai Team |
Support e-mail |
|
Binary / Source / Both Versions |
Binary |
Docker Image Reference |
|
Type of component |
pipeline_job |
Usage Instructions |
|
Result |
Loads data into a dataset |
Example |
Assume that a dataset named ‘test.csv’ has to be loaded from NFS. Create a component of type ‘pipeline_job’, and deploy it using the Custom Docker image specified above. Create a pipeline using the component | To fetch the dataset, the following parameters must be specified when an experiment is run on the pipeline: Run Name: <provide a unique run name> Pipeline Version: <select the version of the pipeline you want to run> dataset-type: ‘structured’ data-config-type: ‘FS’ data-config-path: <path of file you want to connect to> out-path: <path in NFS mount where you want to store the data, e.g., ‘/data’> The fetched dataset will be stored here |
Deploy Solution Arguments:
Field |
Parameter key (refer run-parameters below) |
Description |
Mandatory? |
Dynamic arg required? |
Comments |
-component-name |
component_name |
The component name in the solution |
Yes |
Yes |
|
-connector-output-path |
connector_output_path |
The path where data is saved |
Yes |
Yes |
Absolute path for output directory starting from mount path |
-data- config |
data_config |
Data configuration to load data from |
Yes (Refer ‘alternative args’ section below) |
Yes |
Takes in the dict as string required by xpresso.ai Data Connectivity Library for data connection configuration. Whitespaces are prohibited in this string. e.g. ‘{“type“:“FS“,“path“:“/path/to/data“}’ While providing input in JSON file escape double quotes with “\“. e.g. “{\“type\“:\“FS\“,\“path\“:\“/path/to/data\“}” |
-dataset-type |
dataset_type |
Dataset type |
Yes |
Yes |
Only structured and unstructured datasets are supported. (Case insensitive) |
-dataset-name |
dataset_name |
Name of the dataset to be stored |
No |
Yes |
|
-project-name |
project_name |
Name of the solution |
Yes |
Yes |
|
-description |
description |
Description of the dataset |
No |
Yes |
Whitespaces are prohibited in this string |
-created-by |
created_by |
created by |
No |
Yes |
|
-file-name |
file_name |
The filename for the CSV file to be stored |
No |
Yes |
The dataset.data attribute can be optionally stored as a SCV file in the mount path for future use. Note: Only CSV output is supported |
Alternative arguments for -data-config:
Specific combinations of these arguments become mandatory when not using -data-config arg from above table. Refer Define Connection parameters for better understanding of argument combinations.
Field |
Parameter key (refer run-parameters below) |
Description |
Dynamic arg required? (refer section below if Yes) |
Comments |
-data-config-type |
data_config_type |
Type of data source. Specify FS or DB |
Yes |
|
-data-config-data-source |
data_config_data_source |
Special argument for local/BigQuery connection. Specify value as ‘Local’. Not supported for ‘BigQuery’ |
Yes |
Refers to data saved in mount path if value is ‘Local’ |
-data-config-path |
data_config_path |
Path of the file to load data from |
Yes |
|
-data-config-dsn |
data_config_dsn |
Data Source Name |
Yes |
|
-data-config-table |
data_config_table |
Name of the table |
Yes |
|
-data-config-columns |
data_config_columns |
List of clumn names in a table that need to be fetched. Use ‘*’ to specify all columns |
Yes |
|
-data-config-options |
data_config_options |
Extra keyword arguments to be specified as key-value pairs for better importing through files |
Yes |
Supported for structured dataset-type only |
Dynamic-args:
Specify dynamic argument right after it’s static argument and check the “Dynamic” check box. Value to this dynamic arg should be a placeholder string. This string will appear on Run Experiment form where an actual run-time value for it’s static argument should be filled in.
For example, if the static argument is -data-config, then it’s dynamic arg could be data_config. This data_config will be reflected as an input field in Run Experiment form. Value to this input field can be {“type”:”FS”,”path”:”/path/to/data”} which is the expected value for -data-config arg.
Run-parameters (file or commit ID):
While loading parameters from a file or data versioning repository use mentioned keys. For more details refer Guide For Dynamically Loading Run Parameters From File Or Data Versioning Repository