The /predict endpoint in model deployment lets clients submit input
data and receive the complete prediction results in a single response. This endpoint is suitable
for scenarios where the entire prediction output is required immediately.
Consider increasing the provisioned Load Balancer
bandwidth to avoid these errors by editing the model
deployment.
Tenancy request-rate limit exceeded
Maximum number of requests per second per tenancy is set to 150.
If you're consistently receiving error messages after increasing the LB
bandwidth, use the OCI
Console to submit a support ticket for the
tenancy. Include the following details in the ticket.
Describe the issue with the error message that occurred, and indicate
the new request per second needed for the tenancy.
Indicate that it's a minor loss of service.
Indicate Analytics & AI and Data Science.
Indicate that the issue is creating and managing models.
This example code is a reference to help you invoke your model deployment:
Copy
import requests
import oci
from oci.signer import Signer
import json
# model deployment endpoint. Here we assume that the notebook region is the same as the region where the model deployment occurs.
# Alternatively you can also go in the details page of your model deployment in the OCI console.
# Under "Invoke Your Model", you will find the HTTP endpoint of your model.
endpoint = <your-model-deployment-uri>
# your payload:
input_data = <your-json-payload-str>
if using_rps: # using resource principal:
auth = oci.auth.signers.get_resource_principals_signer()
else: # using config + key:
config = oci.config.from_file("~/.oci/config") # replace with the location of your oci config file
auth = Signer(
tenancy=config['tenancy'],
user=config['user'],
fingerprint=config['fingerprint'],
private_key_file_location=config['key_file'],
pass_phrase=config['pass_phrase'])
# post request to model endpoint:
response = requests.post(endpoint, json=input_data, auth=auth)
# Check the response status. Success should be an HTTP 200 status code
assert response.status_code == 200, "Request made to the model predict endpoint was unsuccessful"
# print the model predictions. Assuming the model returns a JSON object.
print(json.loads(response.content))
Invoking with the OCI CLI 🔗
Use a model deployment in the CLI by invoking it.
The CLI is included in the OCI Cloud Shell environment, and is preauthenticated. This example
invokes a model deployment with the CLI:
Copy
oci raw-request --http-method POST --target-uri
<model-deployment-url>/predict --request-body '{"data": "data"}'
You
can also use the model deployment operation in the CLI for
invocation: