After creating a new deployment or updating an existing employment, you might see a failure. These steps show how to debug the issue:
On your project's home page, select Model Deployments.
Select the model deployment name or select the Actions menu for the model deployment and select
View Details. Next, check the work requests
Under Resources, select Work Request .
The work requests appear at the bottom of the page.
On the Work Requests Information page, select Log Messages.
If any failures occur in the creation steps, under Resources,
select Error Messages.
If the work request shows success, then review the OCI predict logs to identify any errors.
Logs are
attached to the model deployment when it's created.
If logs are attached, select the predict log name to see the log.
Select Explore with Log Search.
Change the filter time to increase the period.
Conda Environment Path isn't Accessible 🔗
Ensure that conda environment path is valid, and that you have configured the appropriate policy for a published conda environment. The conda environment path must remain valid and accessible throughout the lifecycle of the model deployment to ensure availability and proper functioning of the deployed model.
Error Occurred when Starting the Web Server 🔗
Enable the model deployment predict logs to help you debug the errors. Generally, this happens when your code has issues or is missing required dependencies.
Invoking a Model Deployment Failure 🔗
When a model deployment is in an active lifecycleState, the predict or
streaming endpoint can be invoked. The prediction response can return a failure for many
reasons. Use these suggestions to try to resolve these errors:
Ensure that the input passed in the request is in a valid JSON format and matches the expected input by the model.
Ensure that the score.py file doesn't contain errors.
If predictions are returning different results (success, fail) each time the prediction
is called for the same input, it's possible that the allocated resources aren't enough to
server the model prediction. You can edit the load balancer bandwidth to increase it and
the Compute core count to serve more requests in parallel.
Too Many Requests (Status 429) 🔗
If you're getting this error when calling the inference endpoint, it means the requests are
getting throttled.
The solution depends on the error type:
Load Balancer bandwidth limit exceeded
Edit the Model Deployment to increase its Load Balancer bandwidth. You can estimate the bandwidth using
the expected number of requests in seconds, and the combined size of the request and
response payload per request.
Tenancy request-rate limit exceeded
Each prediction endpoint allows a specific number of requests in a certain time
interval (minutes or seconds) per tenant by default. for more information, see, the
Invoking a Model Deployment documentation. Open a support ticket from the
OCI
Console to submit a request for the limit to be
increased.