Predict Endpoint

The /predict endpoint in model deployment lets clients submit input data and receive the complete prediction results in a single response. This endpoint is suitable for scenarios where the entire prediction output is required immediately.

The API responses are:


HTTP Status Code	Error Code	Description	Retry
200	None	200 Success. `{ "data": { "prediction": [ "virginica" ] }, "headers": { "content-length": "28", "content-type": "application/json", "opc-request-id": " }, "status": "200 OK" }`	None
404	`NotAuthorizedOrNotFound`	Model deployment not found or authorization failed.	No
405	`MethodNotAllowed`	Method not allowed.	No
411	`LengthRequired`	Missing content length header.	No
413	`PayloadTooLarge`	The payload size limit is 10 MB.	No
429	`TooManyRequests`	Too Many Requests. LB bandwidth limit exceeded Consider increasing the provisioned Load Balancer bandwidth to avoid these errors by editing the model deployment. Tenancy request-rate limit exceeded Maximum number of requests per second per tenancy is set to 150. If you're consistently receiving error messages after increasing the LB bandwidth, use the OCI Console to submit a support ticket for the tenancy. Include the following details in the ticket. Describe the issue with the error message that occurred, and indicate the new request per second needed for the tenancy. Indicate that it's a minor loss of service. Indicate Analytics & AI and Data Science. Indicate that the issue is creating and managing models.	Yes, with backoff
500	`InternalServerError`	Internal Server Error. Service Timeout. A 60 second timeout for the `/predict` endpoint exists. This timeout value can't be changed. The `score.py` file returns an exception.	Yes, with backoff
503	`ServiceUnavailable`	Model server unavailable.	Yes, with backoff

Invoking with the OCI Python SDK

This example code is a reference to help you invoke your model deployment:

import requests
import oci
from oci.signer import Signer
import json
  
# model deployment endpoint. Here we assume that the notebook region is the same as the region where the model deployment occurs.
# Alternatively you can also go in the details page of your model deployment in the OCI console. 
# Under "Invoke Your Model", you will find the HTTP endpoint of your model.
endpoint = <your-model-deployment-uri>
# your payload:
input_data = <your-json-payload-str>
  
if using_rps: # using resource principal:   
    auth = oci.auth.signers.get_resource_principals_signer()
else: # using config + key:
    config = oci.config.from_file("~/.oci/config") # replace with the location of your oci config file
    auth = Signer(
        tenancy=config['tenancy'],
        user=config['user'],
        fingerprint=config['fingerprint'],
        private_key_file_location=config['key_file'],
        pass_phrase=config['pass_phrase'])
  
# post request to model endpoint:
response = requests.post(endpoint, json=input_data, auth=auth)
  
# Check the response status. Success should be an HTTP 200 status code
assert response.status_code == 200, "Request made to the model predict endpoint was unsuccessful"
  
# print the model predictions. Assuming the model returns a JSON object.
print(json.loads(response.content))

Invoking with the OCI CLI

Use a model deployment in the CLI by invoking it.

The CLI is included in the OCI Cloud Shell environment, and is preauthenticated. This example invokes a model deployment with the CLI:

oci raw-request --http-method POST --target-uri
<model-deployment-url>/predict --request-body '{"data": "data"}'

You can also use the model deployment operation in the CLI for invocation:

# Enable a realm-specific endpoint: https://docs.oracle.com/iaas/tools/oci-cli/3.56.0/oci_cli_docs/oci.html#cmdoption-realm-specific-endpoint
export OCI_REALM_SPECIFIC_SERVICE_ENDPOINT_TEMPLATE_ENABLED=true
 
oci model-deployment inference-result predict
 --model-deployment-id <model-deployment-url> --request-body {"data": "data"}

Oracle Cloud Infrastructure Documentation

Predict Endpoint

Invoking with the OCI Python SDK

Invoking with the OCI CLI