Inference Service lifecycleΒΆ


The Inference Service lifecycle is as follows. The developer must implement a component of type infer_service, for which the implementation is present in a class extending the xpresso.ai AbstractInferenceService class.

  1. Initialization - recall that any xpresso.ai service component is run from within a Docker image. The entrypoint is a main method. This must create an instance of the subclass and call its load method (if this is not implemented in the subclass, the AbstractInferenceService load method will be called). This loads the required model from the versioning system.

  • The superclass load method first gets the credentials for the model versioning system (Pachyderm) by calling the get_credentials method, to be implemented by the sub-class. Once these credentials are obtained, it fetches the model from the versioning system, and stores it in the local file system.

  • Once the model files have been stored on the local file system, the superclass calls the load_model with the path of the model files as a parameter. The sub-class must implement this method to load the model from the files in the path into the model member variable

2. After the model has been loaded as described above, the run_api method of the class must be called. This invokes the superclass run_api method, which creates the API and listens for requests.

  1. Any request received by the API goes through the following steps:

  • Input Transformations - first, the transform_input method is called to handle any transformations to the request (e.g., feature extraction, data cleaning, etc). This method must be implemented by the sub-class. The default implementation in the superclass does nothing.

  • Prediction - next. the predict method is called to make a prediction based on the request. This method must be implemented by the sub-class. The default implementation in the superclass does nothing.

  • Output Transformations - finally, the transform_output method is called to make any transformations to the predicted output before sending the output back to the requesting client. This method must be implemented by the sub-class. The default implementation in the superclass does nothing.