The SDKs for the Oracle Android, Oracle iOS, and Oracle Web channels have been integrated with speech recognition to allow users to talk directly to skills and digital assistants and get the appropriate responses.
When speech recognition is enabled, a microphone button replaces the send button whenever the user input field is empty. Users tap this button to begin recording their voices. The speech is sent to the speech server for recognition, converted to text, and then sent to the skill. If the speech is only partly recognized, then the partial result is displayed in the user input field, allowing the user to clean it up before sending it to the skill.
Set the
enableSpeechRecognition feature
flag to true. Speech Recognition describes this and other voice-recognition properties and
methods.
Improve ASR with Enhanced
Speech 🔗
If your skill's training data contains a lot of application- or skill-specific words
or phrases, jargon, proper nouns, or words with unusual spellings or pronunciations,
then you can increase the likelihood of these getting recognized and transcribed
correctly using an enhanced speech model.
Note
You can only use enhanced speech with
English-language skills (with training data in English) that are intended for an
English-speaking audience.
To build an enhanced speech model:
Select Enable Enhanced Speech in
Settings.
Retrain the skill.
Route an Oracle Web, iOS, or Android client channel to the skill.
Tip:
Enhanced speech models are only available for skills developed with Version 20.12 or later. If you want to use enhanced speech models, then you must upgrade the skill to 20.12.
When you select this option, the speech recognition system builds an
enhanced speech model that's based on the skill's intent and entity data: utterances,
entity values, synonyms for both custom and dynamic entity values, and system entities
that have been associated with intents. The enhanced speech model is updated each time
you retrain your skill (or, as is the case in the current release, when the skill is
retrained after a finalized push request from the Dynamic Entity API).
When users issue a speech request through the Oracle Web, iOS, or Android
client channels, the speech runtime dynamically pulls in the custom language model for
the skill that's routed to the channel. If the channel points to a digital assistant, it
will pull the custom language models for each skill that has Enable Enhanced
Speech enabled. You can toggle this setting on and off for the
individual skills that are registered to a digital assistant.