Managing Imported Models (New)
In addition to using the hosted pretrained models in OCI Generative AI, you can import supported open source and third-party models (for example, from Hugging Face) into OCI Generative AI, host them, create endpoints, and use them like any other model.
Hugging Face Prerequisites
Before you import a model directly from Hugging Face:
- Decide which supported model from Hugging Face to import and note its recommended dedicated AI cluster unit size.
- To access and use some models, you need a Hugging Face token, especially the more recent and gated versions such as Llama 3 and Llama 3.1 For these models, generate an access token from your Hugging Face account settings under Access Tokens. Ensure it has the necessary permissions (at least "read" access).
Object Storage Prerequisites
Before you import a model from an Object Storage bucket:
- If you're not an OCI admin, ask one to give you IAM permission to manage Object Storage in your compartment:
allow group <your-group-name> to manage object-family in compartment <compartment-with-bucket> - Decide on a model that works with the
/v1/chat/completionsendpoint—only these models are supported. - Ensure the model supports only one of these capabilities:
- TEXT_TO_TEXT: text in, text out
- IMAGE_TEXT_TO_TEXT: image or text or both in, text out
- EMBEDDING: text in, vector embeddings out
- RERANK: query and candidate documents in, relevance scores and a reordered list out
- Save model artifacts in an Object Storage bucket.
- Important: The configuration file must be called
config.jsonfor a successful import, similar to most Hugging Face models.
Resource Request & Pricing
To reach an imported model, you create an endpoint for that model on a dedicated AI cluster. Use the following table to request for dedicated AI cluster resources before you import a model.
| Dedicated AI Cluster Unit Size | Limit Name | Request Required Units | AI Unit Count |
|---|---|---|---|
| A10_X1 | dedicated-unit-a10-count |
1 | 1.77 |
| A10_X2 | dedicated-unit-a10-count |
2 | 3.54 |
| A10_X4 | dedicated-unit-a10-count |
4 | 7.08 |
| A100_40G_X1 | dedicated-unit-a100-40g-count |
2 | 2.70 |
| A100_40G_X2 | dedicated-unit-a100-40g-count |
2 | 5.40 |
| A100_40G_X4 | dedicated-unit-a100-40g-count |
4 | 10.8 |
| A100_40G_X8 | dedicated-unit-a100-40g-count |
8 | 21.60 |
| A100_80G_X1 | dedicated-unit-a100-80g-count |
1 | 3.24 |
| A100_80G_X2 | dedicated-unit-a100-80g-count |
2 | 6.48 |
| A100_80G_X4 | dedicated-unit-a100-80g-count |
4 | 12.96 |
| A100_80G_X8 | dedicated-unit-a100-80g-count |
8 | 25.92 |
| H100_X1 | dedicated-unit-h100-count |
1 | 6.01 |
| H100_X2 | dedicated-unit-h100-count |
2 | 12.02 |
| H100_X4 | dedicated-unit-h100-count |
4 | 24.04 |
| H100_X8 | dedicated-unit-h100-count |
8 | 48.08 |
| H200_X1 | dedicated-unit-h200-count |
1 | 6.22 |
| H200_X2 | dedicated-unit-h200-count |
2 | 12.44 |
| H200_X4 | dedicated-unit-h200-count |
4 | 24.88 |
| H200_X8 | dedicated-unit-h200-count |
8 | 49.76 |
To request the resources for the recommended dedicated AI cluster unit size, see requesting a resource limit.
To calculate the price, multiply the price for AI Unit Per Hour for Oracle Cloud Infrastructure Generative AI - Model Import on the Pricing Page Information page to the AI unit count on this page.
Tasks for Importing a Model
- Import the model using one of these options:
- Create a hosting dedicated AI cluster for the imported model with a recommended unit shape.
- Create an endpoint.
- Call the model through OCI Generative AI API, SDK, or use the model in the playground.
Managing the Imported Models
After performing the prerequisites and importing a model, you can perform the following tasks on the imported models: