Providers developing a new ML Application Implementation need to create a new
application package corresponding to the implementation.
The Applications Package lets the standard packaging of ML functionality in a way that's
environment-independent and region-independent. This makes it a portable solution that can be
used in any tenancy, region, or environment. Infrastructure dependencies (for example, VCN and
Log OCIDs) that are specific to a region or environment are provided as Package arguments
during the upload process.
Packages contain all the implementation details for an ML Application, such as Terraform for
example components and application components, a descriptor containing implementation version
information, a configuration schema, and more. These packages can be uploaded or deployed to
existing ML Application Implementation resources, and when a new version of a package is
uploaded, the ML Applications service automatically creates a new ML Application
Implementation Version and starts an upgrade of all ML Application Instances that use the
package.
The package contains components implementing the ML use case. Two types of component exist:
Application components
These are the resources that need to be created per ML Application Implementation,
provisioning new ML Application Implementations involves creating corresponding
application components. Application components are common for all instances of the ML
Application Implementation and aren't created or re-created when new ML Application
Instances are provisioned.
Instance components
These are the resources that need to be created per ML Application Instance.
Provisioning new ML Application Instances involves creating corresponding instance
components. Instance components are different for all instances of the ML Application
Implementation.
The Terraform configuration for all application components is present inside the
application_components directory in the application package. Similarly, the
Terraform for all instance components are present in the instance_components
directory.
To make the distinction between application components and instance components more clear,
consider that the providers want to develop a solution (ML Application Implementation) for
some ML Applications use case, which involves the following parts:
Training and deploying a model
Providers write a machine learning algorithm that trains an ML model based on some
training data. Providers use Jobs for training the model,
storing it in the Model Catalog, and then deploying the model.
Data to be used for training
The model is trained on the customer (consumer) data which resides in an Object Storage bucket in the consumer tenancy. The ML job
loads data from the consumer OS bucket into the provider Object Storage bucket.
In this example, the ML job is an application component and the Terraform configuration for
creating the ML job is part of the application_components directory in the
application package. As the actual training happens on consumer data, the training is
triggered when a new ML Application Instance of the ML Application Implementation is
provisioned. When a new ML Application Instance is created, a new job run is created and
triggered, which loads data from the consumer Object Storage
bucket into the provider Object Storage bucket, trains the
model, stores the model in the model catalog, and then deploys the model. A new job run needs
to be created for every instance (customer). The job run is an instance component. Also, the
target Object Storage bucket is created for each instance, so
it's an instance component. Similarly, the model deployment is also an instance component.
The Terraform configuration for both application components and instance components could be
parameterized. All such parameters required for the provisioning of ML Applications and ML
Application Instance can be specified in the descriptor.yaml file . For
example, the docker image to be used with the job run could be parameterized. The Data Science project under which the job must be created could
be parameterized. All such parameters that belong to the application components and are
required when provisioning new implementations could be specified under
packageArguments in the descriptor.yaml file. In general,
packageArguments can be used for providing environment-specific values such
as infrastructure OCIDs and some environment-specific scaling values.
Similarly, the name of the source OS bucket (from the consumer tenancy) is needed when
creating an ML Application Instance and could be different from instance to instance (consumer
to consumer). So this might be a parameter whose value is provided by the consumer during ML
Application Instance creation. All such parameters can be defined under
configurationSchema in the descriptor.yaml file.
Thus the final structure of an Application package directory looks similar to this:
<ml-app-package-name>-<version>.zip
application_components: the directory with all application component
definitions.
instance_components: the directory with all instance component
definitions.
descriptor.yaml: the package descriptor file.
*.trigger.yaml: the trigger definition file.
Some important notes on the Application package structure:
Both Application components and Instance components must be defined in the corresponding
directories.
The application_components and instance_components
directories are optional. An Application package without an application_components or
instance_components directory is valid.
The directories must be named exactly (lowercase) as
application_components and instance_components.
Components whose Terraform config isn't present under the
application_components directory aren't considered application
components.
Components whose Terraform config isn't present under the
instance_components directory aren't considered instance components.
At the moment, not all OCI resources are
supported as application or instance components.
A Data Science Job is the supported application component,
while Data Science model, model deployment, job run, Object Storage bucket, and object are the supported instance
components.
The next section describes the schema of the package descriptor file in more detail.
ML Applications Building Blocks
ML Applications are built by using other OCI
resources. The following table lists allowed resource types:
Allowed Resource Types
Component Type
Allowed OCI Resources
Notes
Application components
Data Science
Job
Pipeline
Model
Data Flow
Data Flow Application
Multitenant components are shared across all ML Application Instances within an
implementation.
Data Science:
Jobs and Pipelines are commonly used as application components,
defining workflows or tasks performed by the application. When a workflow or
task is triggered for a customer, a new Pipeline Run or Job Run is created,
typically with customer-specific parameters provided by the trigger.
Models are used as application components when a pretrained,
out-of-the-box model is available for the application to use.
Data Flow Applications can be used to
transform large
They can be used as steps within a pipeline.
When a pipeline containing a Data Flow step
is run, it automatically creates and manages a new run of the Data Flow Application associated with that
step. The Data Flow run is treated like any
other step in the pipeline, when successfully completed, the pipeline continues
its run, beginning later steps as part of the pipeline's orchestration.
ML Application triggers can be used as instance
components.
ML Application triggers aren't OCI Resources but they can be used as
instance components.
Triggers are the entry points for workflows (such as
training) defined in your applications. They define under which conditions a
workflow is started and ensure that the workflow is started with the identity of
ML Application Instance (datasciencemlappinstance Resource
Principal).
Single-tenant resources are created uniquely for each ML Application Instance
(SaaS customer).
Models are used as instance components when a new model is trained
specifically for each customer using their data.
Model Deployments serve as instance components to expose
customer-specific models as services.
Buckets function as customer-specific storage for ingested,
transformed, or processed data.
Objects are typically used for storing configurations specific to the
customer.
Schedules enable periodic execution of workflows based on a defined
interval. They're linked to ML Application
Triggers which they invoke at scheduled intervals.
Note
ML Applications doesn't impose limits on the number of components you can use.
While an application might require one pipeline, one trigger, one model, and one model
deployment, you can build more complex applications, such as those with several pipelines,
triggers, models, and model deployments. For example, five pipelines with five triggers,
three models, and three model deployments. Also, ML Applications can be created without
pipelines or model deployments, if they're not needed.
Package Descriptor File 🔗
The following is a schema for the descriptor:
descriptorSchemaVersion
description: The schema version for package descriptor letting further development
of the schema. It has a major and a minor version (for example, "1.0") where the
major version is increased for backward incompatible changes and the minor for
backward compatible changes.
required: true
type: string
description
description: The description of the ML Application Implementation packaged as the
specific ML Applications package. This value is shown as a description field in ML
Applications implementation.
required: false
type: string
mlApplicationVersion
description: The version of the ML Applications contract (that's version field of
ML Applications Version resource) which is implemented by the particular package.
Note
This is a placeholder that's reserved to be used in the future when the ML
Applications Version resource is introduced. The provided value is
ignored.
required: true
type: string
packageVersion
description: the version of the ML Applications package. This value is shown as a
Package version field in ML Application Implementation.
required: true
type: string
packageArguments
description: The list of supported arguments. Arguments can be used for providing
environment-specific values such as infrastructure OCIDs and some
environment-specific scaling values.
type: map (the argument name maps to the properties of argument)
required: false
argument properties:
type
mandatory
type:
type: enum (string or ocid)
required: true
description: The type of the argument value.
Boolean (true or false)
required: false (default is true)
description: Whether the specific argument is mandatory or not.
description
type: string
required: true
description: The argument description.
validationRegexp
type: string
required: false
description: The regular expression used for validation of argument
value.
defaultValue
type: string
required: false
description: The value used if the argument or configuration schema
property isn't specified (it can be specified only when
mandatory is false).
configurationSchema
description: The schema of the configuration which the consumer must provide as
metadata of the ML Application Instance. This value is shown as a
configurationSchema field in ML Application Implementation.
type: map (the configuration property name maps to the properties of the
configuration property)
required: false
argument properties:
type
type: enum (string or secret)
required: true
description: The type of the configuration value.
mandatory
type: Boolean (true or false)
required: false (default is true)
description: Whether the specific configuration property is mandatory or
not.
description
type: string
required: true
description: the configuration property description.
validationRegexp
type: string
required: false
description: The regular expression used for validation of configuration
value.
sampleValue
type: string
required: true
description: The sample value used for validation of instance
components.
defaultValue
type: string
required: false
description: The value used if argument or configuration schema property
isn't specified (mandatory must be false).
Mandatory Terraform Attributes 🔗
All terraform definitions of data science jobs must ensure that the related job runs are
automatically deleted when deleting the
job.
Failure to correctly specify the delete_related_xxx_runs
attributes blocks the deletion of the ML Application Implementation version. The provider
needs to remove the run resources to unblock the deletion.
Tenant Isolation and OCI SDK Version 🔗
Tenant isolation ensures the segregation of data and workloads for each customer. The ML
Application service propagates the resource principal (identity) of ML Application Instances
to workloads (Pipeline or Job Runs) started by ML Application triggers.
The propagation of the ML Application Instance resource principal requires corresponding
support in the OCI SDKs: