Invoking a Model Deployment

Invoking a model deployment means that you can pass feature vectors or data samples to the inference endpoint, and then the model returns predictions for those data samples.

After a model deployment is in an active lifecycleState, the inference endpoint can successfully receive requests made by clients. The endpoints supported by the service are as follows:
Response Types
Response Type Endpoint Description
Single /predict Returns a single response.
Streaming /predictWithResponseStream Returns real-time streaming of partial results as they're generated by the model.

From a model deployment detail page, select Invoking Your Model to see a panel with two main categories: Non-Streaming and Streaming.

Each category displays the following details:

  • The model HTTP endpoint. For a private model deployment, HTTP endpoint contains a private FQDN that was set while creating the private endpoint. For more information, see Creating a Private Endpoint.
  • Sample code to invoke the model endpoint using the OCI CLI. Or, use the OCI Python and Java SDKs to invoke the model with the provide code sample.

Use the sample code to invoke a model deployment.

Invoking a model deployment calls the inference endpoint of the model deployment URI. This endpoint takes sample data as input and is processed using the predict() function in the score.py model artifact file. The sample data is in JSON format though can be in other formats. Processing means that the sample data could be transformed then passed to a models inference method. The models can generate predictions that can be processed before being returned back to the client.