Predict Endpoint

The /predict endpoint in model deployment lets clients submit input data and receive the complete prediction results in a single response. This endpoint is suitable for scenarios where the entire prediction output is required immediately.

The API responses are:
HTTP Status Code Error Code Description Retry

200

None

200 Success.

{
 "data": {
 "prediction": [
 "virginica"
 ]
 },
 "headers": {
 "content-length": "28",
 "content-type": "application/json",
 "opc-request-id": "
 },
 "status": "200 OK"
}

None

404

NotAuthorizedOrNotFound

Model deployment not found or authorization failed.

No

405

MethodNotAllowed

Method not allowed.

No

411

LengthRequired

Missing content length header.

No

413

PayloadTooLarge

The payload size limit is 10 MB.

No

429

TooManyRequests

Too Many Requests.

LB bandwidth limit exceeded

Consider increasing the provisioned Load Balancer bandwidth to avoid these errors by editing the model deployment.

Tenancy request-rate limit exceeded

Maximum number of requests per second per tenancy is set to 150.

If you're consistently receiving error messages after increasing the LB bandwidth, use the OCI Console to submit a support ticket for the tenancy. Include the following details in the ticket.

  • Describe the issue with the error message that occurred, and indicate the new request per second needed for the tenancy.

  • Indicate that it's a minor loss of service.
  • Indicate Analytics & AI and Data Science.

  • Indicate that the issue is creating and managing models.

Yes, with backoff

500

InternalServerError

Internal Server Error.

  • Service Timeout.

    A 60 second timeout for the /predict endpoint exists. This timeout value can't be changed.

  • The score.py file returns an exception.

Yes, with backoff

503

ServiceUnavailable

Model server unavailable.

Yes, with backoff

Invoking with the OCI Python SDK

This example code is a reference to help you invoke your model deployment:
import requests
import oci
from oci.signer import Signer
import json
  
# model deployment endpoint. Here we assume that the notebook region is the same as the region where the model deployment occurs.
# Alternatively you can also go in the details page of your model deployment in the OCI console. 
# Under "Invoke Your Model", you will find the HTTP endpoint of your model.
endpoint = <your-model-deployment-uri>
# your payload:
input_data = <your-json-payload-str>
  
if using_rps: # using resource principal:   
    auth = oci.auth.signers.get_resource_principals_signer()
else: # using config + key:
    config = oci.config.from_file("~/.oci/config") # replace with the location of your oci config file
    auth = Signer(
        tenancy=config['tenancy'],
        user=config['user'],
        fingerprint=config['fingerprint'],
        private_key_file_location=config['key_file'],
        pass_phrase=config['pass_phrase'])
  
# post request to model endpoint:
response = requests.post(endpoint, json=input_data, auth=auth)
  
# Check the response status. Success should be an HTTP 200 status code
assert response.status_code == 200, "Request made to the model predict endpoint was unsuccessful"
  
# print the model predictions. Assuming the model returns a JSON object.
print(json.loads(response.content))

Invoking with the OCI CLI

Use a model deployment in the CLI by invoking it.

The CLI is included in the OCI Cloud Shell environment, and is preauthenticated. This example invokes a model deployment with the CLI:

oci raw-request --http-method POST --target-uri
<model-deployment-url>/predict --request-body '{"data": "data"}' 
You can also use the model deployment operation in the CLI for invocation:
# Enable a realm-specific endpoint: https://docs.oracle.com/iaas/tools/oci-cli/3.56.0/oci_cli_docs/oci.html#cmdoption-realm-specific-endpoint
export OCI_REALM_SPECIFIC_SERVICE_ENDPOINT_TEMPLATE_ENABLED=true
 
oci model-deployment inference-result predict
 --model-deployment-id <model-deployment-url> --request-body {"data": "data"}