DataOps

1. How do I connect to different data sources in xpresso.ai?

You can use the xpresso.ai Data Connectivity component from the xpresso Component Library for this, or you can use the xpresso.ai Data Connectivity library to create your own custom Data Connectivity component. The detailed documentation can be found here:

2. What are the different databases supported within xpresso.ai?

xpresso uses PrestoDB at the core, which is a distributed SQL query engine designed to query large datasets distributed over one or more heterogeneous data sources. The xpresso.ai Data Connectivity library, as well as the Data Connectivity component, supports data ingestion from different RDBMS and NoSQL databases such as MySQL, Microsoft SQL Server, MongoDB, and Cassandra.

3. How can I version my data?

There are three methods through which we can version your data:

4. How can I version my data without moving my data out?

This feature is currently not supported.

5. How much disk space do I get for each solution?

xpresso.ai provides two shared file systems for each solution:

  • The Network File System (NFS) for standard solutions

  • The Hadoop Distributed File System (HDFS) for Big Data solutions

The xpresso.ai Control Center enables developers to explore the shared disk space for solutions and to upload and download files. You can define the disk space when defining the solution. There is a cap placed for the disk space that is configurable based on the infrastructure limits.

6. Does xpresso support SSD?

Yes, SSD are supported and can be used instead of NFS. This can be configured at installation time.

7. How is data governance handled in xpresso.ai?

It is currently not supported out of the box, but in a coming release, users can add an approval workflow for data-related operations within xpresso and can define custom governance models.

8. How does xpresso help with data exploration?

There are three methods through which we can do basic exploration of your data:

9. How is data wrangling supported in xpresso.ai?

We do not have explicit functionality for data wrangling included in xpresso. This can be achieved by defining a custom component in xpresso and coding it appropriately.

10. What are some of the data preparation features available in xpresso.ai?

The xpresso.ai Python libraries enable basic data cleansing, e.g., imputing null values, etc.

11. How does xpresso ensure data security?

We have functionality to encrypt data both at rest as well as in motion.

12. How do I access the shared space within my code?

You can access the shared space for a project by clicking on either “NFS explorer” or “HDFS Explorer” under the “Data Ops” tab.

13. Can I use the shared space to communicate between my different components?

Yes, each solution gets its own folder in the shared disk space. This folder contains subfolders called “components” and “pipelines”, which, in turn, contain folders for each component or pipeline in the solution. You can use this shared space within your code by using the mounted folder path. The component root folder is mounted at “/component”, while the pipeline root folder is mounted at “/pipeline”.

14. How can I do data transformation within xpresso.ai?

We do not have explicit functionality for data transformation included in xpresso. This can be achieved by defining a custom component in xpresso and coding it appropriately.

15. Does xpresso support data profiling?

We do not have explicit functionality for data profiling included in xpresso. This can be achieved by defining a custom component in xpresso and coding it appropriately.

16. Can I create an ETL pipeline within xpresso?

You can build ETL pipelines by creating custom components (or selecting a pre-built component from the xpresso Component Library) for each stage and writing code for each. You can execute the pipeline just like any other pipeline. We have a sample solution that demonstrates how you can build an ETL pipeline. You can find the details here: link.

17. How do I do data validation in xpresso.ai?

We do not have explicit functionality for data validation included in xpresso. This can be achieved by defining a custom component in xpresso and coding it appropriately.