Data Visualization Library

Dataset visualization is provided through a Visualization class. An instance of this class should be created on a Dataset object on which the exploration has been performed (EDA Exploration Documentation) along with the visualization library using the get_visualizer function. This instance can then be used for visualization.

1. get_visualizer

This method creates an instance of a visualization object based on the visualization library to be used.

Sample code to instantiate Visualization object

from xpresso.ai.core.data.visualization import Visualization
# create visualization object using dataset object created earlier
visualize = Visualization.get_visualizer(diabetic_dataset, visualization_library = "seaborn")
# If no visualization library has been provided default "seaborn" is used
visualize = Visualization.get_visualizer(diabetic_dataset)

Once the Visualization class has been instantiated, various methods of the class can be used for visualization. These are detailed below:

2. render_univariate

This method renders visualizations for all the univariate analysis done on the dataset. This method has the following parameters:

Sample code to render univariate visualization

# Render univariate can be called using visualization object created earlier
visualize.render_univariate(attr_name="medical_specialty")

3. render_multivariate

This method renders visualizations for all the multivariate analysis done on the dataset. This method has the following parameters:

Sample code to render multivariate visualization

# Render multivariate can be called using visualization object created earlier
visualize.render_multivariate()

4. render_all

This method renders visualizations for both univariate and multivariate analysis done on the dataset. This method has the following parameters:

Sample code to render multivariate visualization

# Render all can be called using visualization object created earlier
visualize.render_all(report=True, target="readmitted")
#report parameter "True" generates a report and saves it in ./report folder

5. scatter

This method creates a scatter plot for the specified attributes in the dataset. This method has the following parameters:

Sample code to render scatter plots

# Scatter can be called using visualization object created earlier
visualize.scatter(attribute_x="encounter_id", attribute_y="patient_nbr")

Target variable plots

  1. Box plot (Multiclass target variable vs numeric attribute and numeric target variable vs categorical attribute) : Each box represent numeric distribution for a particular class on the x-axis (i.e. target variable/categorical attribute)

  2. **Density plot **(Multiclass target variable vs numeric attribute) : Kernel density plot of numeric data for each category in the target variable

  3. Bar plot with line plot  (Multiclass target variable vs categorical attribute) : Bar plot of categorical attribute frequency distribution along with line plots specifying the percentage of the corresponding target variable category

  4. Mosaic plot (Multiclass target variable vs categorical attribute) : Each block represents the frequency of category on the x-axis vs category on the y-axis. (i.e. width is proportional to category frequency on the x-axis, height is proportional to category frequency on the y-axis)

  5. Scatter plot (Numeric target variable vs numeric attribute) : Scatter plot along with linear fit regression line