Data Management Tasks


Data Science solutions typically involve a lot of data management tasks. These typically involve fetching data from remote sources, analyzing the data, cleaning it, etc. These activities may happen once at the beginning of the solution development process, or may have to be programmed as recurring tasks in the solution.

xpresso.ai has python libraries as well as off-the-shelf Library components to support a number of data management tasks, such as:

  • Data Connectivity - pull data from a variety of data sources, such as databases and local / remote / cloud file systems. Use the xpresso.ai Data Connectivity Library component for this, or use the xpresso.ai Data Connectivity library to create your own custom data connectivity component

  • Data Exploration - univariate and bivariate exploration of data. Use the Data Exploration component from the xpresso.ai COmponent Library for this, or use the xpresso.ai Data Exploration library to create your own custom data exploration component

  • Data Visualization - graphs and reports detailing exploration results. Use the Data Visualization component from the xpresso.ai Component Library for this, or use the xpresso.ai Data Visualization library to create your own custom data visualization component

  • Data Versioning - push data into and pull data out of xpresso.ai’s data repository. Use the Data Versioning component from the xpresso.ai Component Library for this, or use the xpresso.ai Data Versioning library to create your own custom data versioning component

Each of these tasks can be performed by using xpresso.ai Data Management libraries, or by using xpresso.ai Component Library components. The choice of libraries or off-the-shelf components depends on the exact use case, as summarized in the table below:

Description

One-time / recurring

Suggestion

I want to pull data from a data source, and do some basic exploration and visualization

One-time

  1. Use xpresso.ai libraries for data connectivity, exploration and visualization within a Jupyter notebook

  2. Create an xpresso.ai solution, and store the final data on the shared drive for the solution - other solution components can then access the data from the shared drive by mounting the drive and reading the data from it

  3. Alternatively (and preferably), use the xpresso.ai data versioning libraries to push the data into the solution data repository

I want to pull data from a data source, and do some customized exploration and visualization

One-time

  1. Use xpresso.ai libraries for data connectivity within a Jupyter notebook to pull the data

  2. Write custom code within Jupyter for exploration and visualization

  3. Create an xpresso.ai solution, and store the final data on the shared drive for the solution - other solution components can then access the data from the shared drive by mounting the drive and reading the data from it

  4. Alternatively (and preferably), use the xpresso.ai data versioning libraries to push the data into the solution data repository

I want to pull data from a data source, and do some basic exploration and visualization

Recurring

  1. Use xpresso.ai libraries for data connectivity, exploration and visualization within a Jupyter notebook

  2. Create an xpresso.ai solution, and store the final data on the shared drive for the solution - other solution components can then access the data from the shared drive by mounting the drive and reading the data from it

  3. Alternatively (and preferably), use the xpresso.ai data versioning libraries to push the data into the solution data repository

  4. Use the xpresso.ai Plugin for Jupyter (XPJ) to import your notebook directly into xpresso.ai. Create pipelines using the components so imported

  5. Alternatively (and preferably), use a component from the xpresso.ai Component Library to create the pipeline

  6. Build, deploy and run the pipeline as required

I want to pull data from a data source, and do some customized exploration and visualization

Recurring

  1. Use xpresso.ai libraries for data connectivity within a Jupyter notebook to pull the data

  2. Write custom code within Jupyter for exploration and visualization

  3. Create an xpresso.ai solution-, and store the final data on the shared drive for the solution - other solution components can then access the data from the shared drive by mounting the drive and reading the data from it

  4. Alternatively (and preferably), use the xpresso.ai data versioning libraries to push the data into the solution data repository

  5. Use the xpresso.ai Plugin for Jupyter (XPJ) to import your notebook directly into xpresso.ai. Create a pipeline using the components so imported

  6. Alternatively, create the appropriate components using the xpresso.ai Control Center, and follow the coding process to populate the components with code. You should be able to use components from the xpresso.ai Component Library for data connectivity and versioning without having to write custom code for these activities

  7. Build, deploy and run the pipeline as required

What do you want to do next?