Supported Models for Import

You can import large language models from Hugging Face and models imported from an OCI Object Storage bucket into OCI Generative AI, create endpoints for those models, and use them in the Generative AI service.

Supported Model Architectures

Generative AI service supports importing and deploying leading open source and third-party language models to speed up AI initiatives. The following model architectures are supported:

Chat Models

Chat models let you ask questions and get conversational in-context answers from AI. Select from the following model families to balance speed, quality, and cost for the use case. Select each link for a list of supported models with their model ID, model capability and recommended dedicated AI cluster unit shapes.

  • Alibaba Qwen 3 and Qwen 2

    Features advanced multilingual and multimodal capabilities.

  • Google Gemma

    Built for broad language processing needs and high versatility.

  • Llama2, Llama3, Llama3.1, Llama3.2, Llama3.3, Llama4

    An improved version of Meta Llama models with Grouped Query Attention (GQA).

  • Microsoft Phi

    Known for efficiency and compactness, designed for scalable and flexible performance.

  • OpenAI GptOss

    An advanced open-weight transformer architecture with Mixture-of-Experts (MoE) architecture, optimized for efficient, high-quality language reasoning and large context handling.

Embed Model

An embedding model transforms input data (such as words and images) into numerical vectors that capture their semantic meaning or relationships. This allows machines to understand similarities, relationships, and patterns within the data more effectively. Select the following link for the model ID, model capability and recommended dedicated AI cluster unit shape.

  • Mistral

    A high-performance, decoder-only Transformer architecture featuring Sliding Window Attention (SWA) for efficient long-context handling and optional Grouped Query Attention (GQA) for improved scalability.