About the Chat Models in Generative AI

Prompt the OCI Generative AI chat models to generate text. You can ask questions in natural language and optionally submit text such as documents, emails, and product reviews to the chat models and each model reasons over the text and provides intelligent answers. For some models, you can submit images and ask questions about the image. The chat models keep the context of your previous prompts and you can continue the chat with follow-up questions. Here's an example:

User prompt: "Hi, Can you tell me how to fish?"

Output for previous prompt: "Sure! I'd be happy to help. Here are the basic steps to start fishing..." And this sentence is followed by several steps and a follow-up question such as "Please let me know if you have any specific questions about fishing gear, techniques, or tips for a successful fishing trip and I'd be happy to help!"

User follow-up prompt: "Can you tell me more about step 3."

The chat model's response: "Certainly! Step 3 talks about the important considerations for setting up your fishing rod and tackle, which is a crucial aspect of preparing for a successful fishing trip. Here's a deeper dive into this step to help you understand the various components and choices involved in setting up your fishing gear" and so on.

Following are some example use cases for chat models:

Chat: Chat with the model to brainstorm ideas, solve problems, explain concepts, and answer questions.
Write new content: Ask the chat model to draft marketing copy, emails, blog posts, product descriptions, documents, and so on.
Translate: Translate text from one language to another. See each model's language coverage.
Retrieve information from text: Provide text to the chat model and use the chat model for question-answering, sentiment analysis, classifying, and information retrieval.
Retrieve information from images: Provide an image to the chat model and use the chat model for captioning and information retrieval. This feature is available for some models.
Enhance existing content: Rewrite content in a different style or language such as restructuring a text to list the topics, add headings, and summarize the topics.

Selecting a Model for Chat

Select a model for your chat conversation based on the model size, your project goal, cost, and the style of the model's response. Use the playground's provided examples with each listed model to get a feel for how each model responds to the same prompt and then decide which model's response style goes well with your use case.

The following pretrained foundational models are available in OCI Generative AI for chat. Review the features to help you select the best model for your use case.

Cohere Command Family

Models

Command A (cohere.command-a-03-2025) (New)
Command R 08-2024 (cohere.command-r-08-2024)
Command R+ 08-2024 (cohere.command-r-plus-08-2024)
Command R (cohere.command-r-16k) (Deprecated)
Command R+ (cohere.command-r-plus) (Deprecated)

Key Features

The cohere.command-a-03-2025 model is the most performant Cohere chat model to date with better throughput than cohere.command-r-08-2024. This model has a 256,000 token context length.

The cohere.command-r-16k (Command R) and cohere.command-r-plus (Command R+) chat models both belong to the Cohere Command R family of LLMs, but they differ in their capabilities, use cases, and pricing. Key differences include:

Model Size and Performance: Command R is a smaller-scale language model than Command R+. While Command R offers high-quality responses, the responses might not have the same level of sophistication and depth as the Command R+ responses. Command R+ is a larger model, resulting in enhanced performance and more sophisticated understandings.
Use Cases: Command R is suited for various applications, including text generation, summarization, translation, and text-based classification. It's an ideal choice for building conversational AI agents and chat-based applications. Command R+, on the other hand, is designed for more complex language tasks that require deeper understanding and nuance, such as text generation, question-answering, sentiment analysis, and information retrieval.
Capacity and Scalability: Command R can handle a moderate number of concurrent users compared to Command R+. Command R+, however, is designed to handle a higher volume of requests and support more complex use cases, which might result in higher prices because of its increased capacity and performance.

In summary, Command R is an excellent choice for those looking for a more affordable and flexible option for general language tasks. On the other hand, Command R+ is designed for power users who require advanced language understanding, higher capacity, and more nuanced responses. The choice between the two would depend on the specific requirements and budget of your application.

Cohere Command R 08-2024 and Cohere Command R+ 08-2024 (New)

The cohere.command-r-plus-08-2024 (Command R+ 08-2024) and cohere.command-r-08-2024 (Command R 08-2024) chat models are the improved versions of cohere.command-r-plus (Command R+) and cohere.command-r-16k (Command R) and have the following key features:

They have significant performance improvements in latency and throughput compared to their predecessors.
Both models have a 128,000 token context length which is eight times more than their previous models.
Both models have improved in areas such as math, coding, and reasoning.
Command R 08-2024 is now in par Command R+.
Both models are available for on-demand inferencing and hosting on dedicated AI clusters.
The Command R 08-2024 model is available for fine-tuning with your dataset.
Enhanced multilingual retrieval-augmented generation (RAG) feature with customizable citation options.
Multilingual support of 10 languages: Arabic, Chinese (Mandarin), English, French, German, Italian, Japanese, Korean, Portuguese, and Spanish
New Structured Outputs feature to create user-defined JSON outputs, such as user-defined JSON schemas.
Improved structured data analysis for finance, Software as a Service (SaaS), and consulting use cases

Meta Llama 4 (New)

Meta Llama 3.3 70B

Meta Llama 3.2 Family

Meta Llama 3.1 Family

Chat Model Parameters

When using the chat models, you can vary the output by changing the following parameters.

Maximum output tokens

The maximum number of tokens that you want the model to generate for each response. Estimate four characters per token. Because you're prompting a chat model, the response depends on the prompt and each response doesn't necessarily use up the maximum allocated tokens. For each model's maximum output tokens, see the key features in Pretrained Foundational Models in Generative AI.

Preamble override

An initial context or guiding message for a chat model. When you don't give a preamble to a chat model, the default preamble for that model is used. You can assign a preamble in the Preamble override parameter, for the models. The default preamble for the Cohere Command R family is:

You are Command.
            You are an extremely capable large language model built by Cohere. 
            You are given instructions programmatically via an API
            that you follow to the best of your ability.

Overriding the default preamble is optional. When specified, the preamble override replaces the default Cohere preamble. When adding a preamble, for best results, give the model context, instructions, and a conversation style.

Tip

For chat models without the preamble override parameter, you can include a preamble in the chat conversation and directly ask the model to answer in a certain way. For example: "Answer the following question in a pirate tone. Where's the best place to go sailing?"

Example 1 for Preamble

Example preamble: You're a travel advisor. Answer with a pirate tone.
Example prompt: Where can I find the best cannolis in Italy?
Example response using the preceding preamble: Argh, me heartie! Ye be wantin' some of the finest cannolis in the land, do ye? Ye should set sail for the lush lands of Sicily, where the streets be lined with delicious pastries and the smells of sweet ricotta fill the air. In the port town of Palermo, ye can find some 'o the finest cannoli kitchens. The locals be claims that the best cannolis be found at Pastificio Graffeo, where the shells be crispy and filled with creamy ricotta.

Example 2 for Preamble

Example preamble: You're a seasoned marketing professional with a deep understanding of consumer behavior and market trends. Answer with a friendly and informative tone, sharing industry insights and best practices.
Example prompt: Where can I find the best cannolis in Italy?
Example response to the preceding preamble: Cannolis are a delicious Italian dessert, and finding the best ones really depends on your personal preferences and what makes the perfect cannoli for you! In fact, Sicily boasts some renowned cannoli hotspots. Caffe Sierra in Palermo has a reputation for exceptional cannolis, as does Pasticceria Cappello, which has multiple branches across the city. These spots are known for their traditional recipes and high-quality ingredients, using fresh ricotta cheese and thin, crispy pastry shells.

Safety Mode

Adds a safety instruction for the model to use when generating responses. Options are:

Contextual: (Default) Puts fewer constraints on the output. It maintains core protections by aiming to reject harmful or illegal suggestions, but it allows profanity and some toxic content, sexually explicit and violent content, and content that contains medical, financial, or legal information. Contextual mode is suited for entertainment, creative, or academic use.
Strict: Aims to avoid sensitive topics, such as violent or sexual acts and profanity. This mode aims to provide a safer experience by prohibiting responses or recommendations that it finds inappropriate. Strict mode is suited for corporate use, such as for corporate communications and customer service.
Off: No safety mode is applied.

Note

This parameter is only compatible with models cohere.command-r-08-2024, cohere.command-r-plus-08-2024 and Cohere models released after these models. See release dates.

Temperature

The level of randomness used to generate the output text.

Tip

Start with the temperature set to 0 or less than one, and increase the temperature as you regenerate the prompts for a more creative output. High temperatures can introduce hallucinations and factually incorrect information.

Top p

A sampling method that controls the cumulative probability of the top tokens to consider for the next token. Assign p a decimal number between 0 and 1 for the probability. For example, enter 0.75 for the top 75 percent to be considered. Set p to 1 to consider all tokens.

Top k

A sampling method in which the model chooses the next token randomly from the top k most likely tokens. A high value for k generates more random output, which makes the output text sound more natural. The default value for k is 0 for Cohere Command models and -1 for Meta Llama models, which means that the model should consider all tokens and not use this method.

Frequency penalty

A penalty that's assigned to a token when that token appears frequently. High penalties encourage fewer repeated tokens and produce a more random output.

For the Meta Llama family models, this penalty can be positive or negative. Positive numbers encourage the model to use new tokens and negative numbers encourage the model to repeat the tokens. Set to 0 to disable.

Presence penalty

A penalty that's assigned to each token when it appears in the output to encourage generating outputs with tokens that haven't been used.

Seed

A parameter that makes a best effort to sample tokens deterministically. When this parameter is assigned a value, the large language model aims to return the same result for repeated requests when you assign the same seed and parameters for the requests.

Allowed values are integers and assigning a large or a small seed value doesn't affect the result. Assigning a number for the seed parameter is similar to tagging the request with a number. The large language model aims to generate the same set of tokens for the same integer in consecutive requests. This feature is especially useful for debugging and testing. The seed parameter has no maximum value for the API, and in the Console, its maximum value is 9999. Leaving the seed value blank in the Console, or null in the API disables this feature.

Warning

The seed parameter might not produce the same result in the long-run, because the model updates in the OCI Generative AI service might invalidate the seed.

Tool Support

OCI Generative AI has tool support for the pretrained chat models, enabling them to integrate with external tools and APIs to enhance their responses.

A tool call can help a model produce contextually relevant responses and also broadens its utility, enabling the model to handle complex queries and perform tasks requiring external data and processing. For example, using tool calls, you can have a model fetch real-time data, run code, and interact with databases. Tool support is available only through the API.

For Meta Llama models, see the new tools attribute in the GenericChatRequest API reference.
For Cohere models, see the tools attribute in the CohereChatRequest API reference.

Oracle Cloud Infrastructure Documentation

About the Chat Models in Generative AI

Selecting a Model for Chat 🔗

Chat Model Parameters 🔗

Tool Support 🔗

Selecting a Model for Chat

Chat Model Parameters

Tool Support