Managing Imported Models (New)

In addition to using the hosted pretrained models in OCI Generative AI, you can import supported open source and third-party models (for example, from Hugging Face) into OCI Generative AI, host them, create endpoints, and use them like any other model.

Hugging Face Prerequisites

Before you import a model directly from Hugging Face:

  • Decide which supported model from Hugging Face to import and note its recommended dedicated AI cluster unit size.
  • To access and use some models, you need a Hugging Face token, especially the more recent and gated versions such as Llama 3 and Llama 3.1 For these models, generate an access token from your Hugging Face account settings under Access Tokens. Ensure it has the necessary permissions (at least "read" access).

Object Storage Prerequisites

Before you import a model from an Object Storage bucket:

  • If you're not an OCI admin, ask one to give you IAM permission to manage Object Storage in your compartment:
    allow group <your-group-name> to manage object-family in compartment <compartment-with-bucket>
                                
  • Decide on a model that works with the /v1/chat/completions endpoint—only these models are supported.
  • Ensure the model supports only one of these capabilities:
    • TEXT_TO_TEXT: text in, text out
    • IMAGE_TEXT_TO_TEXT: image or text or both in, text out
    • EMBEDDING: text in, vector embeddings out
    • RERANK: query and candidate documents in, relevance scores and a reordered list out
  • Save model artifacts in an Object Storage bucket.
  • Important: The configuration file must be called config.json for a successful import, similar to most Hugging Face models.

Resource Request & Pricing

To reach an imported model, you create an endpoint for that model on a dedicated AI cluster. Use the following table to request for dedicated AI cluster resources before you import a model.

Dedicated AI Cluster Unit Sizes for Imported Models
Dedicated AI Cluster Unit Size Limit Name Request Required Units AI Unit Count
A10_X1 dedicated-unit-a10-count 1 1.77
A10_X2 dedicated-unit-a10-count 2 3.54
A10_X4 dedicated-unit-a10-count 4 7.08
A100_40G_X1 dedicated-unit-a100-40g-count 2 2.70
A100_40G_X2 dedicated-unit-a100-40g-count 2 5.40
A100_40G_X4 dedicated-unit-a100-40g-count 4 10.8
A100_40G_X8 dedicated-unit-a100-40g-count 8 21.60
A100_80G_X1 dedicated-unit-a100-80g-count 1 3.24
A100_80G_X2 dedicated-unit-a100-80g-count 2 6.48
A100_80G_X4 dedicated-unit-a100-80g-count 4 12.96
A100_80G_X8 dedicated-unit-a100-80g-count 8 25.92
H100_X1 dedicated-unit-h100-count 1 6.01
H100_X2 dedicated-unit-h100-count 2 12.02
H100_X4 dedicated-unit-h100-count 4 24.04
H100_X8 dedicated-unit-h100-count 8 48.08
H200_X1 dedicated-unit-h200-count 1 6.22
H200_X2 dedicated-unit-h200-count 2 12.44
H200_X4 dedicated-unit-h200-count 4 24.88
H200_X8 dedicated-unit-h200-count 8 49.76
Tip

To request the resources for the recommended dedicated AI cluster unit size, see requesting a resource limit.

To calculate the price, multiply the price for AI Unit Per Hour for Oracle Cloud Infrastructure Generative AI - Model Import on the Pricing Page Information page to the AI unit count on this page.