Creating a Transcription Job

Create and submit a job to transcribe one or more media files to text files in the Speech service.

Before you begin

Store the media files that you want to transcribe in an Object Storage bucket.
To compare the Whisper and Oracle ASR models for transcription job creation, see Comparing Whisper and Oracle ASR Models.

To create a transcription job, follow these steps:
1. Open the navigation menu and click Analytics & AI. Under AI Services, click Speech.
2. In the left-side navigation menu, click Transcription jobs
3. Under List scope, select the compartment that you want to work in.
4. Click Create job.
5. On the basic information page, enter a unique name (255 character limit) for the project. The name must include one or more alphanumeric characters, dashes, or underscores in any order. If you don't provide a name, a name is automatically generated for you.
  
  For example:
  
  AiSpeechTranscriptionJob20220804134759
6. (Optional) Enter a description (400 character limit) for the job.
7. Select the compartment to create the job in, if different from the one displayed.
8. Under Input, select a data input bucket that contains the media file that you want to transcribe.
  
  If the bucket that you want isn't in the selected compartment, change the compartment.
9. Under Output, select where you want to store the output files, either in the input bucket or in a different bucket. To use a different bucket, select it.
10. (Optional) Enter an output prefix to separate and sort the files in the bucket.
  
  For example, you could enter call_ctr for call center media files.
  
  You can also create an output folder in your bucket by using a slash (/). For example, MyResults/ stores all the transcribed files in a MyResults folder in the bucket.
11. Select the model type of the job you're creating.
  
  Note
  
  The supported model types are Oracle, Whisper Medium, Whisper Large V2 (on service request) and Whisper Large V3 Turbo (new). See Comparing Whisper and Oracle ASR Models to decide the model type to use.
12. Select the language of the media file.
  
  You can search for the appropriate language by language. English (US English for ORACLE) is the default.
  
  Whisper models support language identification, and can be used by selecting auto as the language code in the dropdown list.
13. (Optional) To include both the SRT and JSON formats in the transcription, select Get SRT transcription format.
14. If you don't want your transcription punctuated, clear Enable punctuation.
  
  Note
  
  Enable punctuation is selected for Whisper models and can't be cleared.
15. (Optional) To identify the speakers in the input file, select Enable diarization.
  
  You can let the Speech service automatically detect the number of unique speakers in the input file or you can enter a number. The minimum number of speakers is 2 and the maximum is 16.
  
  Note
  
  Using diarization increases the transcription task latency, which is why this option is disabled by default.
16. To add filters to change the way the output file is generated, click Add filter.
  
  Select a filter type. Profanity is the default.
  
  Select the filter mode:
  
  For example, the profanity filter offers these modes:
  
  Mask:Any detected profanity is masked in the transcription with asterisks except for the first letter.
  
  Remove: Any detected profanity is replaced with one asterisk in the transcription.
  
  Tag: Profanity isn't masked or removed but is marked as TYPE: "Profanity" in the transcription.
17. (Optional) To add more settings, click Add additional settings, and then enter a key and its value.
  Example:
  
  Key: This is the key value, for example, whisperPrompt.
  You can add as many keys as needed. To delete a key, select the X beside the Value field.
  If Whisper model is selected, then this field can be used to pass a prompt to help in transcription. The only supported key is whisperPrompt. If any other keys are passed, the request fails as is considered as invalid input.
  The maximum length of the value prompt is 4000 characters, allows only alphanumeric characters and punctuation (. , ! ? - : ; ' "), and is specific to Whisper. The validation is performed in the background, and the job fails if the prompt is longer than this limit.
  
  Value: This is a prompt value and can be several words. The field is example to show all text entered.
  
  Note
  
  Adding a prompt to the whisper model might yield unexpected results at times.
18. (Optional) Click Show advanced options to assign tags to the job. Tags help you to easily locate and track resources by selecting a tag namespace, then entering the key and value.
  
  Tagging describes the various tags that you can use organize and find resources including cost-tracking tags.
19. Click Next to choose the files for the job.
20. Select the checkboxes for the media files that you want to transcribe or select them all by selecting the checkbox next to Name.
  
  Note
  
  The maximum file size is 2 GB.
  
  File duration is a maximum of 4 hours.
21. Click Submit to start the job.
  
  A job can run in seconds or hours depending on the size and number of files that you select. While running, the job is in an in-progress state that changes to succeeded or failed when it finishes. You can select a job to go to its details page.
  
  Each job can have up to 100 tasks.
  
  Jobs are retained for 90 days.
Use the create command and required parameters to create a transcription job.
```
oci speech transcription-job create [OPTIONS]
```
Avoid entering confidential information.

For a complete list of flags and variable options for CLI commands, see the CLI Command Reference.
Use the CreateTranscriptionJob and ChangeTranscriptionJobCompartment operations to create a job.

Oracle Cloud Infrastructure Documentation

Creating a Transcription Job