Custom Scaling Metric Type to Configure Autoscaling
Use the custom metric type option to configure autoscaling.
Use the custom scaling metric option to use any of the Model Deployment Metrics emitted by the model deployment resource to create an MQL query, which can then be used to configure autoscaling. This approach lets you create more sophisticated queries, such as joining several queries using AND, and OR, using different aggregation functions, and incorporating an evaluation window of choice. By using this option, you gain greater control over the scaling conditions, enabling a more tailored and precise setup.
When formulating an MQL query, include {resourceId =
"MODEL_DEPLOYMENT_OCID"}
in the query as shown in the examples provided. During the
processing of the request, the service replaces the placeholder
MODEL_DEPLOYMENT_OCID
keyword with the actual resource OCID. This lets the
service retrieve the exact set of metrics associated with the resource.
Testing Custom Metric MQL Queries
Follow these steps to test and complete the queries.
Example Queries
These queries are provided for reference and can be customized based on the specific use case. However, these queries can also be used without modification.
Metric | Query | Explanation |
---|---|---|
PredictRequestCount |
absent() function into the
alarm query. The following is an example query for scenarios where minimal or no
predict calls are made:
|
Use the provided metric and queries for scaling in response to predict request volume. If the total count of prediction requests to the specific model deployment exceeds 100 within a one-minute time window and this condition persists for the specified pending duration time, it triggers a scale-out operation. Similarly, if the cumulative count is less than 5, or if there are no requests at all, and this situation continues for the pending duration time, the condition begins a scale-in operation. |
PredictLatency |
|
Apply this metric and queries to help scaling based on predict request latencies. The query evaluates the 99th percentile of PredictLatency for a specific model deployment over a 1-minute period. If this 99th percentile latency value exceeds 120 milliseconds and persists for the pending duration time, the condition is met, triggering a scale-out operation. Conversely, if the 99th percentile is less than 20 milliseconds for the pending duration time, a scale-in operation is started. |
PredictResponse - Success Rate |
|
Use this metric and queries to implement scaling based on predict response success rate. The MQL query evaluates the percentage of successful PredictResponses compared to all PredictResponses within a 1-minute interval for a specific model deployment. If this percentage is less than 95 and persists for the pending duration time, the condition triggers a scale-out operation. Conversely, if the percentage is more than 95 for the pending duration time, the condition starts a scale-in operation. |
Creating a Model Deployment with Autoscaling Using a Custom Metric
Learn how to create a model deployment with an autoscaling policy using a custom metric.
Use the oci data-science model-deployment create command and required parameters to create a model deployment:
oci data-science model-deployment create --required-param-name variable-name ... [OPTIONS]
For example, deploy a model:Use this model deployment JSON configuration file:oci data-science model-deployment create \ --compartment-id <MODEL_DEPLOYMENT_COMPARTMENT_OCID> \ --model-deployment-configuration-details file://<MODEL_DEPLOYMENT_CONFIGURATION_FILE> \ --project-id <PROJECT_OCID> \ --display-name <MODEL_DEPLOYMENT_NAME>
{ "deploymentType": "SINGLE_MODEL", "modelConfigurationDetails": { "modelId": "ocid1.datasciencemodel.oc1.iad.amaaaaaav66vvnias2wuzfkwmkkmxficse3pty453vs3xtwlmwvsyrndlx2q", "instanceConfiguration": { "instanceShapeName": "VM.Standard.E4.Flex", "modelDeploymentInstanceShapeConfigDetails": { "ocpus": 1, "memoryInGBs": 16 } }, "scalingPolicy": { "policyType": "AUTOSCALING", "coolDownInSeconds": 650, "isEnabled": true, "autoScalingPolicies": [ { "autoScalingPolicyType": "THRESHOLD", "initialInstanceCount": 1, "maximumInstanceCount": 2, "minimumInstanceCount": 1, "rules": [ { "metricExpressionRuleType": "CUSTOM_EXPRESSION", "scaleInConfiguration": { "scalingConfigurationType": "QUERY", "pendingDuration": "PT5M", "instanceCountAdjustment": 1, "query": "MemoryUtilization[1m]{resourceId = 'MODEL_DEPLOYMENT_OCID'}.grouping().mean() < 10" }, "scaleOutConfiguration": { "scalingConfigurationType": "QUERY", "pendingDuration": "PT3M", "instanceCountAdjustment": 1, "query": "MemoryUtilization[1m]{resourceId = 'MODEL_DEPLOYMENT_OCID'}.grouping().mean() > 65" } } ] } ] }, "bandwidthMbps": 10, "maximumBandwidthMbps": 20 } }
For a complete list of parameters and values for CLI commands, see the CLI Command Reference.
Use the CreateModelDeployment operation to create a model deployment using the custom scaling metric type.