Pretrained Foundational Models in Generative AI

You can use the following pretrained foundational models in OCI Generative AI:

Important

For supported model time lines, see Retiring the Models.
Chat Models (New)

Ask questions and get conversational responses through an AI chat interface.

Model Available in These Regions Key Features
cohere.command-r-16k
  • Brazil East (Sao Paulo)
  • Germany Central (Frankfurt)
  • UK South (London)
  • US Midwest (Chicago)
  • User prompt can be up to 16,000 tokens, and response can be up to 4,000 tokens for each run.
  • Optimized for conversational interaction and long context tasks. Ideal for text generation, summarization, translation, or text-based classification.
  • You can fine-tune this model with your dataset.
cohere.command-r-plus
  • Brazil East (Sao Paulo)
  • Germany Central (Frankfurt)
  • UK South (London)
  • US Midwest (Chicago)
  • User prompt can be up to 128,000 tokens, and response can be up to 4,000 tokens for each run.
  • Optimized for complex tasks, offers advanced language understanding, higher capacity, and more nuanced responses, and can maintain context from its long conversation history of 128,000 tokens. Also ideal for question-answering, sentiment analysis, and information retrieval.
meta.llama-3.1-70b-instruct
  • Brazil East (Sao Paulo)
  • Germany Central (Frankfurt)
  • UK South (London)
  • US Midwest (Chicago)
  • Model has 70 billion parameters.
  • User prompt and response can be up to 128,000 tokens for each run.
  • You can fine-tune this model with your dataset.
meta.llama-3.1-405b-instruct
  • Brazil East (Sao Paulo) (dedicated AI cluster only)
  • Germany Central (Frankfurt) (dedicated AI cluster only)
  • UK South (London) (dedicated AI cluster only)
  • US Midwest (Chicago)
  • Model has 450 billion parameters.
  • User prompt and response can be up to 128,000 tokens for each run.
  • On-demand inferencing is only available in the US Midwest (Chicago) region. Other regions require that you create your own dedicated AI clusters and endpoints to host this model on those clusters for inferencing.
meta.llama-3-70b-instruct (deprecating soon)
  • Brazil East (Sao Paulo)
  • Germany Central (Frankfurt)
  • UK South (London)
  • US Midwest (Chicago)
  • Model has 70 billion parameters.
  • User prompt and response can be up to 8,000 tokens for each run.
  • You can fine-tune this model with your dataset.
Tip

Learn about chat models.

Embedding Models

Convert text to vector embeddings to use in applications for semantic searches, text classification, or text clustering.

Model Available in These Regions Key Features
cohere.embed-english-v3.0
  • Brazil East (Sao Paulo)
  • Germany Central (Frankfurt)
  • UK South (London)
  • US Midwest (Chicago)
  • English or multilingual.
  • Model creates a 1,024-dimensional vector for each embedding.
  • Maximum 96 sentences per run.
  • Maximum 512 tokens per embedding.
cohere.embed-multilingual-v3.0
  • Brazil East (Sao Paulo)
  • Germany Central (Frankfurt)
  • UK South (London)
  • US Midwest (Chicago)
  • English or multilingual.
  • Model creates a 1,024-dimensional vector for each embedding.
  • Maximum 96 sentences per run.
  • Maximum 512 tokens per embedding.
cohere.embed-english-light-v3.0
  • US Midwest (Chicago)
  • Light models are smaller and faster than the original models.
  • English or multilingual.
  • Model creates a 384-dimensional vector for each embedding.
  • Maximum 96 sentences per run.
  • Maximum 512 tokens per embedding.
cohere.embed-multilingual-light-v3.0
  • US Midwest (Chicago)
  • Light models are smaller and faster than the original models.
  • English or multilingual.
  • Model creates a 384-dimensional vector for each embedding.
  • Maximum 96 sentences per run.
  • Maximum 512 tokens per embedding.
Tip

Learn about the embedding models.

Generation Models (Deprecated)

Give instructions to generate text or extract information from text.

Important

All OCI Generative AI foundational pretrained models supported for the on-demand serving mode that use the text generation and summarization APIs (including the playground) are now retired. If you host a summarization or a generation model such as cohere.command on a dedicated AI cluster, (dedicated serving mode), you can continue to use that model until it's retired. See Retiring the Models for retirement dates and definitions. We recommend that you use the chat models instead.
Model Available in These Regions Key Features
cohere.command (deprecated)
  • US Midwest (Chicago)
  • Model has 52 billion parameters.
  • User prompt and response can be up to 4,096 tokens for each run.
  • You can fine-tune this model with your dataset.
cohere.command-light (deprecated)
  • US Midwest (Chicago)
  • Model has 6 billion parameters.
  • User prompt and response can be up to 4,096 tokens for each run.
  • You can fine-tune this model with your dataset.
meta.llama-2-70b-chat (deprecated)
  • US Midwest (Chicago)
  • Model has 70 billion parameters.
  • User prompt and response can be up to 4,096 tokens for each run.
The Summarization Model (Deprecated)

Summarize text with your instructed format, length, and tone.

Important

The cohere.command model supported for the on-demand serving mode is now retired and this model is deprecated for the dedicated serving mode. If you're hosting cohere.command on a dedicated AI cluster, (dedicated serving mode) for summarization, you can continue to use this hosted model replica with the summarization API and in the playground until the cohere.command model retires for the dedicated serving mode. See Retiring the Models for retirement dates and definitions. We recommend that you use the chat models instead which offer the same summarization capabilities, including control over summary length and style.
Model Available in These Regions Key Features
cohere.command (deprecated)
  • US Midwest (Chicago)
  • Model has 52 billion parameters.
  • User prompt and response can be up to 4,096 tokens for each run.
Tip

Learn about the summarization model.