ML Applications

Troubleshoot ML Applications.

mlapp CLI Fails with Profile 'oc1' not found Error

Using mlapp CLI fails with an error.

You get the following error:
oci.exceptions.ProfileNotFound: Profile 'oc1' not found in config file /Users/SAUVERMA/.oci/config 
For more info about config file and how to get required information, see https://docs.oracle.com/en-us/iaas/Content/API/Concepts/sdkconfig.htm

The OCI config file doesn't contain the profile that's configured in: ml-application/environments/<Your Environment>/env-config.yaml.

See SDK & CLI Configuration File for more information.

mlapp CLI Fails with status 404 Error

Using mlapp CLI fails with an error.

You get the following error:
raise exceptions.ServiceError(
 oci.exceptions.ServiceError:
  {
   'target_service': 'data_science',
   'status': 404,
   'code': 'NotAuthorizedOrNotFound',
   'opc-request-id': '0A20...BFE',
   'message': 'Authorization failed or requested resource not found.'

Incorrect OCI profile

The OCI profile is incorrect or the tenancy connected to is incorrect.

Check the contents of ml-application/environments/<Your Environment>/env-config.yaml are correct.

Incorrect OCI compartment

The OCI compartment is incorrect.

Check the contents of ml-application/environments/<Your Environment>/env-config.yaml are correct.

Incorrect policies

The policies are incorrect.

See Policies to ensure the policies are correct.

mlapp Deployment Fails with No such file or directory Error

No such file or directory error when deploying mlapp.

This error occurs:
Environment not provided. Trying to find default environment configuration in file /.../ml-application/default_env
Traceback (most recent call last):
  File "/.../ml-application/scripts/ml_app_project.py", line 110, in _init_environment
    with open(default_env_config_path, 'r', encoding='utf-8') as file:
         ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/.../ml-application/default_env'

This is because the default_env file is missing in the ml-application folder.

  1. Copy the ml-application/default_env.example file to the ml-application/default_env file.
  2. Edit the default_env file to contain the name of the default environment, for example, dev.

mlapp Deployment Fails with Duplicate values not allowed Error

Duplicate values not allowed error when deploying mlapp.

This error occurs:
oci.exceptions.ServiceError:
 {'target_service': 'data_science', 'status': 409, 'code': 409, 'opc-request-id':
  'E4BF4B76D66E47B697AEFB6E4620C52A/10FA6F202B8E250D289494D37811DBD1/DCE40088309662E9ECD8D6875BA090F2',
   'message': 'Duplicate values not allowed: {Column: name, Value: STRING:fetalrisk-demo-aierpdev1; Column: tenantId,
    Value: STRING:ocid1.tenancy.oc1..aaaaaaaa2ps3chzjosolav3xdpkconxxsypwmztpxidle5pwqhrynz42fhhq}',
    'operation_name': 'create_ml_application',
    'timestamp': '2025-01-30T15:31:56.869585+00:00',
    'client_version': 'Oracle-PythonSDK/2.137.2+preview.1.92',
    'request_endpoint': 'POST https://datascience.us-ashburn-1.oci.oraclecloud.com/20190101/mlApplications',
    'logging_tips': 'To get more info on the failing request, refer to https://docs.oracle.com/en-us/iaas/tools/python/latest/logging.html 
     for ways to log the request/response details.',
    'troubleshooting_tips': "See https://docs.oracle.com/iaas/Content/API/References/apierrors.htm#apierrors_409__409_409 for more 
     information about resolving this error.
     Also see https://docs.oracle.com/iaas/api/#/en/data-science/20190101/MlApplication/CreateMlApplication for details on this
     operation's requirements.
     If you are unable to resolve this data_science issue, please contact Oracle support and provide them this full error message."}

This is because an ML Application with the same name already exists in the tenancy..

  1. Create a new environment. The environment name is added as a suffix to the full application name, so ensuring the application name is different.
  2. Or, change the name of the application in the ml-application/application-def.yaml file.

mlapp Trigger Command Fails with a 409 Conflict Error

You get a 409 Conflict error when running the mlapp trigger command.

This error occurs because a trigger execution is already in progress. Wait for the current execution to complete before starting another one. This behavior ensures that race conditions are avoided in workflows started by the trigger.

  1. Check the log file used by the job run for further information.
  2. For the instance, select Workflows.
  3. Go to the trigger work request.
  4. The work request log starts with pipeline run ID (OCID).
  5. Search by the pipeline run ID to find the pipeline run.
  6. In the pipeline run, go to step run and then to the corresponding job run.
  7. From the job run, open the log use by the job run.

Infrastructure Creation Fails

Infrastructure creation fails.

The incorrect profile was used to connect to OCI.

Check the profile specified as the DEFAULT profile is used by default.

The profile used doesn't have permissions to some or all of create network resources (VCN, subnets, and service gateways), log groups and logs, or data science projects.

Check the policies specified. For more information, see the Policies section.

Package Upload Fails with a 404 Error

Package upload fail with the error:

Authorization failed requested resource not found (404 Not Authorized or Not Found)

This error occurs when the referenced resource (for example, a subnet) either doesn't exist or isn't accessible.

Ensure the resource exists and verify that you have a policy granting the ML Application Instance resource principal a permission to use it. For example:
allow any-user to use virtual-network-family in compartment <subnet_compartment> 
where ALL { request.principal.type = 'datasciencemlapplicationinstance' }

Package Upload Fails with a 409 Error

Uploading a package gives a 409 - Conflict error.

This error occurs when a prior package upload is still running.

Wait for the package upload to complete and try again.

Ingestion Job Fails

In ML Applications the ingestion job fails.

Several root causes are possible and are captured in the ingestion job's run logs.

  1. From the instance in question, open Work Requests.
  2. Select the trigger work request.
  3. Select the work request log. Its name starts with pipeline run ID (OCID).
  4. Search the log file by the pipeline run ID to find the pipeline run.
  5. In the pipeline run, go to step run and then to the corresponding job run.
  6. From the job run, open the log used by the job run.
  7. Search the log for the cause of the failure.

Implementation Update, or Instance Update or Creation Failures

How to resolve implementation update, instance update, or instance creation failures.

  1. Navigate to ML Applications
  2. Find the problematic resource (for example ML Application Implementation or ML Application Instance View).
  3. Go to Work requests on the implementation and instance view details. They provide error and log messages. They can help you with troubleshooting and finding out the root cause of the failure.

Package Upload Fails with an error on ContentLength

ML Application package upload fails with the work request log containing that the content length is different to the body length. For example: ContentLength=4679 with Body length 4804.

The line feed character is incorrect in the package. This is often the case with Windows users.

  1. Change the line separator in the whole ML Application sample project to LF.
  2. Rebuild the package.
  3. Deploy the package again.