Define connection parameters¶
Earlier, we described connection parameters to be certain key-value pairs defined in a dictionary object. In this wiki we will learn how multiple combinations of such key-value pairs link a developer to different data sources.
We are aware that sub-class of Dataset object is named as per the data they carry. Like StructuredDataset object has data of Structured kind. Similarly, there’s a difference between a set of connection parameters to a data source for every Dataset object. For the ease of understanding, let us call connection parameters as config, henceforth, which is a dictionary object.
Below are the various tables that list key-values in config definition for a data source against a Dataset object. Refer Supported data sources here.
StructuredDataset object
Connection parameters for supported file systems:
Populates class variable data as a Pandas DataFrame object of file specified in value of “path“ key.
Key |
Type |
M andatory? |
Value |
De scription |
---|---|---|---|---|
type |
str |
Yes |
Should be set to “FS” |
Type of data source |
data_source |
str |
No |
Should be set as “local” |
Special key to connect to local file system only. |
path |
str |
Yes |
Any existing path to file on Al luxio/local file system. For supported extensions, refer the Pandas section in options table at the end of this page. |
Path of file to be imported. Make sure the target data source is mounted. Contact xpresso.ai ad ministrator if there is an issue. |
options |
dict |
No |
Extra keyword arguments to be specified as key-value pair, for better importing through file system. |
Refer the Pandas section in options table at the end of this page. |
Connection parameters for supported database systems:
Populates class variable data as a Pandas DataFrame object of table specified in value of “table“ key.
Key |
Type |
** Mandatory?** |
Value |
D escription |
---|---|---|---|---|
type |
str |
Yes |
Should be set to “DB” |
Type of data source |
DSN |
str |
Yes |
Data source name |
Should refer to a Data Source registered with the Presto server. Contact the xpresso.ai System Ad ministrators to register a data source. |
table |
str |
Yes |
Name of table to import data. |
Should refer to an existing table. |
columns |
str/list of str |
Yes |
com ma-separated list of columns to be imported (“*” will import all columns) |
UnstructuredDataset object
Connection parameters for supported file system:
Populates class variable data as a Pandas DataFrame object with information regarding files/directories specified in value of “path“ key.
Key |
Type |
M andatory? |
Value |
De scription |
---|---|---|---|---|
type |
str |
Yes |
Should be set to “FS” |
Type of data source |
data_source |
str |
No |
Should be set as “local” |
Special key to connect to local file system only. |
path |
str |
Yes |
Any existing path to a directory on Al luxio/local file system. |
Path of folder can be given for importing multiple files. Make sure the target data source is mounted. Contact xpresso.ai ad ministrator if there is an issue. |
DistributedStructuredDataset object
Connection parameters for HDFS:
Populates class variable data as a Koalas DataFrame object of file specified in value of “path“ key.
Limitation: For all Big Data solutions, xpresso.ai Data connectivity module supports importing data only through a registered HDFS. Please contact xpresso.ai administrator to get your HDFS mounted.
Key |
Type |
M andatory? |
Value |
De scription |
---|---|---|---|---|
type |
String |
Yes |
Should be set to “FS” |
Type of data source |
d ataset_type |
String |
Yes |
Should be set to “d istributed“ |
Special key to state the type of dataset as distributed |
path |
String |
Yes |
Path of file to be imported. For supported extensions, refer the Koalas section in options table at the end of this page. |
Path of file to be imported. Make sure the HDFS has the file present. Contact xpresso.ai ad ministrator if there is an issue. |
options |
dict |
No |
Extra keyword arguments to be specified as key-value pair, for better importing through file system. |
Refer the Koalas section in options table at the end of this page. |
Following is a sample config. Please define all of the above key-value pairs according to this sample config:
from xpresso.ai.core.data.automl.structured_dataset import StructuredDataset
dataset = StructuredDataset()
config = { "type": "FS", "options": {"sep": "|"}, "path": "/path/to/your/data/file.txt" }
dataset.import_dataset(config)
# Print the Pandas DataFrame
dataset.data.head()
Sample output - DataFrame representation
Options table for supported file extensions:
The Data Connectivity library can be used to create custom connectors for different data sources, e.g., Google BigQuery
What do you want to do next?