Custom Scaling Metric Type to Configure Autoscaling

Use the custom metric type option to configure autoscaling.

Use the custom scaling metric option to use any of the Model Deployment Metrics emitted by the model deployment resource to create an MQL query, which can then be used to configure autoscaling. This approach lets you create more sophisticated queries, such as joining several queries using AND, and OR, using different aggregation functions, and incorporating an evaluation window of choice. By using this option, you gain greater control over the scaling conditions, enabling a more tailored and precise setup.

When formulating an MQL query, include {resourceId = "MODEL_DEPLOYMENT_OCID"} in the query as shown in the examples provided. During the processing of the request, the service replaces the placeholder MODEL_DEPLOYMENT_OCID keyword with the actual resource OCID. This lets the service retrieve the exact set of metrics associated with the resource.

Testing Custom Metric MQL Queries

Follow these steps to test and complete the queries.

Follow the steps in Viewing Model Deployment Metrics to view metrics.
Select the metric chart for the metric you want to use.
Select Options.
Navigate to View Query in MQL Explorer.
Select Edit Queries.
Select for Advanced Mode.
In the Query code editor, update and test the query for scaling-out and scaling-in operations.
Use these tested queries to create model deployments with autoscaling capabilities.

Example Queries

The following are sample queries for metrics you can use to enable autoscaling.

Note

These queries are provided for reference and can be customized based on the specific use case. However, these queries can also be used without modification.

Sample Queries for Model Deployment Metrics
Metric	Query	Explanation
PredictRequestCount	Scale out `PredictRequestCount[1m]{resourceId = "MODEL_DEPLOYMENT_OCID"}.grouping().sum() > 100` Scale in `PredictRequestCount[1m]{resourceId = "MODEL_DEPLOYMENT_OCID"}.grouping().sum() < 5` If no predict calls are made, then no metrics are emitted. In such cases, it becomes necessary to incorporate the `absent()` function into the alarm query. The following is an example query for scenarios where minimal or no predict calls are made: `PredictRequestCount[1m]{resourceId = "MODEL_DEPLOYMENT_OCID"}.grouping().absent() == 1 \|\| PredictRequestCount[1m]{resourceId = "MODEL_DEPLOYMENT_OCID"}.grouping().sum() < 2`	Use the provided metric and queries for scaling in response to predict request volume. If the total count of prediction requests to the specific model deployment exceeds 100 within a one-minute time window and this condition persists for the specified pending duration time, it triggers a scale-out operation. Similarly, if the cumulative count is less than 5, or if there are no requests at all, and this situation continues for the pending duration time, the condition begins a scale-in operation.
PredictLatency	Scale out `PredictLatency[1m]{resourceId = "MODEL_DEPLOYMENT_OCID"}.groupBy(result).percentile(.99) > 120` Scale in `PredictLatency[1m]{resourceId = "MODEL_DEPLOYMENT_OCID"}.groupBy(result).percentile(.99) < 20`	Apply this metric and queries to help scaling based on predict request latencies. The query evaluates the 99th percentile of PredictLatency for a specific model deployment over a 1-minute period. If this 99th percentile latency value exceeds 120 milliseconds and persists for the pending duration time, the condition is met, triggering a scale-out operation. Conversely, if the 99th percentile is less than 20 milliseconds for the pending duration time, a scale-in operation is started.
PredictResponse - Success Rate	Scale out `(PredictResponse[1m]{resourceId = "MODEL_DEPLOYMENT_OCID", result = "Success"}.grouping().mean() * 100) / PredictResponse[1m]{resourceId = "MODEL_DEPLOYMENT_OCID"}.grouping().mean() < 95` Scale in `(PredictResponse[1m]{resourceId = "MODEL_DEPLOYMENT_OCID", result = "Success"}.grouping().mean() * 100) / PredictResponse[1m]{resourceId = "MODEL_DEPLOYMENT_OCID"}.grouping().mean() > 95`	Use this metric and queries to implement scaling based on predict response success rate. The MQL query evaluates the percentage of successful PredictResponses compared to all PredictResponses within a 1-minute interval for a specific model deployment. If this percentage is less than 95 and persists for the pending duration time, the condition triggers a scale-out operation. Conversely, if the percentage is more than 95 for the pending duration time, the condition starts a scale-in operation.

Creating a Model Deployment with Autoscaling Using a Custom Metric

Learn how to create a model deployment with an autoscaling policy using a custom metric.

1. From the model deployments page, select Create model deployment. If you need help finding the list of model deployments, see Listing Model Deployments.
2. Follow the steps in Creating a Model Deployment to configure the model deployment.
3. Under Autoscaling configuration, select Enable autoscaling.
  Several lists and fields are displayed to let you configure the autoscaling.
4. Select Custom from the Scaling metric type list.
5. Populate Scale-in custom metric query and Scale-out custom metric query with the MQL queries.
  
  Important
  
  Include
  {resourceId = "MODEL_DEPLOYMENT_OCID"}
  in each query. The actual resource OCID is used instead of "MODEL_DEPLOYMENT_OCID" when the query is run.
6. Select Create.

Use the oci data-science model-deployment create command and required parameters to create a model deployment:

oci data-science model-deployment create --required-param-name variable-name ... [OPTIONS]

For example, deploy a model:

oci data-science model-deployment create \
--compartment-id <MODEL_DEPLOYMENT_COMPARTMENT_OCID> \
--model-deployment-configuration-details file://<MODEL_DEPLOYMENT_CONFIGURATION_FILE> \
--project-id <PROJECT_OCID> \
--display-name <MODEL_DEPLOYMENT_NAME>

Use this model deployment JSON configuration file:

{
  "deploymentType": "SINGLE_MODEL",
  "modelConfigurationDetails": {
    "modelId": "ocid1.datasciencemodel.oc1.iad.amaaaaaav66vvnias2wuzfkwmkkmxficse3pty453vs3xtwlmwvsyrndlx2q",
    "instanceConfiguration": {
      "instanceShapeName": "VM.Standard.E4.Flex",
      "modelDeploymentInstanceShapeConfigDetails": {
        "ocpus": 1,
        "memoryInGBs": 16
      }
    },
    "scalingPolicy": {
      "policyType": "AUTOSCALING",
      "coolDownInSeconds": 650,
      "isEnabled": true,
      "autoScalingPolicies": [
        {
          "autoScalingPolicyType": "THRESHOLD",
          "initialInstanceCount": 1,
          "maximumInstanceCount": 2,
          "minimumInstanceCount": 1,
          "rules": [
            {
              "metricExpressionRuleType": "CUSTOM_EXPRESSION",
              "scaleInConfiguration": {
                "scalingConfigurationType": "QUERY",
                "pendingDuration": "PT5M",
                "instanceCountAdjustment": 1,
                "query": "MemoryUtilization[1m]{resourceId = 'MODEL_DEPLOYMENT_OCID'}.grouping().mean() < 10"
              },
              "scaleOutConfiguration": {
                "scalingConfigurationType": "QUERY",
                "pendingDuration": "PT3M",
                "instanceCountAdjustment": 1,
                "query": "MemoryUtilization[1m]{resourceId = 'MODEL_DEPLOYMENT_OCID'}.grouping().mean() > 65"
              }
            }
          ]
        }
      ]
    },
    "bandwidthMbps": 10,
    "maximumBandwidthMbps": 20
  }
}

For a complete list of parameters and values for CLI commands, see the CLI Command Reference.

Use the CreateModelDeployment operation to create a model deployment using the custom scaling metric type.

Oracle Cloud Infrastructure Documentation

Custom Scaling Metric Type to Configure Autoscaling

Testing Custom Metric MQL Queries 🔗

Example Queries 🔗

Creating a Model Deployment with Autoscaling Using a Custom Metric 🔗

Testing Custom Metric MQL Queries

Example Queries

Creating a Model Deployment with Autoscaling Using a Custom Metric