Creating a Dedicated AI Cluster for Hosting Models

Create a dedicated AI cluster resource in OCI Generative AI to host endpoints for pretrained base models, custom models, or imported models.

Important

  • Not Available on-demand: All OCI Generative AI foundational pretrained models supported for the on-demand serving mode that use the text generation and summarization APIs (including the playground) are now retired. We recommend that you use the chat models instead.
  • Can be hosted on clusters: If you host a summarization or a generation model such as cohere.command on a dedicated AI cluster, (dedicated serving mode), you can continue to use that model until it's retired. These models, when hosted on a dedicated AI cluster are only available in US Midwest (Chicago). See Deprecated APIs in Generative AI for the date that the APIs are no longer available.
    1. On the Dedicated AI clusters list page, select Create dedicated AI cluster. If you need help finding the list page, see Listing Dedicated AI Clusters.
    2. Select a compartment to create the dedicated AI cluster in. The default compartment is the same as the list page, but you can select any compartment that you have permission to work in.
    3. (Optional) Enter a name and description for the cluster. If you don't enter a name, the system generates one that you can change later.

      The generated name has the format generativeaidedicatedaicluster<timestamp>. For example: generativeaidedicatedaicluster20250922181431

    4. For Cluster type, select Hosting.
    5. For Base model, select one of the following:
      • The pretrained foundational model that you're hosting.
      • If using a custom model, fine-tuned from a foundational model, select the original foundation (base) model it was trained on.
      • If using an imported model, select that imported model.
    6. If you selected an imported model, select a recommended Unit size.
      For unit size recommendations, see Supported Models for Import. You can also use this guide for models that you store in buckets.

      Unit shape names have two parts: instance type and number of cards. Example: H100_X1 = H100 with 1 card. For A100, both A100-80G and A100-40G are available; the memory size in the name (80G or 40G) distinguishes them.

      Important

      You can't change the unit shape after creating a dedicated AI cluster.
    7. (Optional) Increase the number of instances in the Model replica field.
      Important

      When you create a cluster for hosting models for inference, by default one unit is created for the base model that you select. To increase the throughput, you can increase the number of instances in the Model replica field now, or later when you edit the cluster. For example, creating two model replicas on this cluster, requires two units.
    8. Read the commitment unit hours for the hosting cluster and select the checkbox to agree to the commitment.
    9. (Optional) Select Add tag and assign tags to this cluster.
    10. Select Create.
    Note

    Clusters take a few minutes to create. After the cluster is in an active state, you can select that cluster to host a model, when you create an endpoint for that model.
  • Use the dedicated-ai-cluster create command and required parameters to create a dedicated AI cluster:

    oci generative-ai dedicated-ai-cluster create 
    --compartment-id <compartment-OCID>
    --type HOSTING
    --unit-count [integer]
    --unit-shape [text]
    [OPTIONS]

    For a complete list of parameters and values for CLI commands, see the CLI Command Reference.

  • Run the CreateDedicatedAiCluster operation to create a dedicated cluster.