You can prompt a generation model to generate text. Following are some example use cases for text generation models:
Copy generation: Draft marketing copy, emails, blog posts, product descriptions, documents, and so on.
Ask questions: Ask the models to explain concepts, brainstorm ideas, solve problems, and answer questions on information that the models have been trained on.
Stylistic conversion: Edit text or rewrite content in a different style or language.
Important
Not Available on-demand: All OCI
Generative AI foundational pretrained models supported for the on-demand serving mode that use the text generation and summarization APIs (including the playground) are now retired. We recommend that you use the chat models instead.
Can be hosted on clusters: If you host a summarization or a generation model such as cohere.command on a dedicated AI cluster, (dedicated serving mode), you can continue to use that model until it's retired. These models, when hosted on a dedicated AI cluster are only available in US Midwest (Chicago). See Retiring the Models for retirement dates and definitions.
Selecting a Generation Model
Select a generation model that you hosted on a dedicated AI cluster, to generate text based on the model size, your project goal, cost, and the model's response.
A highly performant generation model with 50 billion parameters and a great general
knowledge of the world. Use this model from brainstorming to optimizing for accuracy
such as text extraction and sentiment analysis, and for complex instructions to draft
your marketing copies, emails, blog posts, and product descriptions, and then review and
use them.
A quick and light generation model. Use this model for tasks that require a basic
knowledge of the world and simple instructions, when speed and cost is important. For
best results, you must give the model clear instructions. The more specific your prompt,
the better this model performs. For example, instead of the prompt, "What is the
following tone?", write, "What is the tone of this product review? Answer with
either the word positive or negative.".
This 70 billion parameter model was trained on a dataset of 1.2 trillion tokens, that
includes texts from the internet, books, and other sources. Use this model for text
generation, language translation, summarization, question answering based on the content
of a given text or topic, and content generation such as articles, blog posts, and
social media updates.
Generation Model Parameters đź”—
When using the generation models, you can vary the output by changing the following parameters.
Maximum output tokens
The maximum number of tokens that you want the model to generate for each response. Estimate four characters per token.
Temperature
The level of randomness used to generate the output text.
Tip
Start with the temperature set to 0 or less than one, and increase the
temperature as you regenerate the prompts for a more creative output. High
temperatures can introduce hallucinations and factually incorrect information.
Top k
A sampling method in which the model chooses the next token randomly from the
top k most likely tokens. A higher value for k
generates more random output, which makes the output text sound more natural. The
default value for k is 0 for command models and -1 for
Llama models, which means that the models should consider all tokens
and not use this method.
Top p
A sampling method that controls the cumulative probability of the top tokens to
consider for the next token. Assign p a decimal number between 0 and 1
for the probability. For example, enter 0.75 for the top 75 percent to be considered.
Set p to 1 to consider all tokens.
Stop sequences
A sequence of characters—such as a word, a phrase, a newline (\n), or
a period—that tells the model when to stop the generated output. If you have more than
one stop sequence, then the model stops when it reaches any of those sequences.
Frequency penalty
A penalty that's assigned to a token when that token appears frequently. High penalties encourage fewer repeated tokens and produce a more random output.
Presence penalty
A penalty that's assigned to each token when it appears in the output to encourage generating outputs with tokens that haven't been used.
Show likelihoods
Every time a new token is to be generated, a number between -15 and 0 is assigned to
all tokens, where tokens with higher numbers are more likely to follow the current
token. For example, it's more likely that the word favorite is followed by the
word food or book rather than the word zebra. This parameter is
available only for the cohere models.