STT: Recognizing Speech
This tutorial demonstrates how you can recognize sound data recorded by user and send the result as text.
Warm-up
Become familiar with the STT API basics by learning about:
-
Creating and Destroying STT Handles
Create and destroy the STT handle.
-
Setting and Unsetting Callbacks
Set and unset callbacks for obtaining notifications about recognition results, state changes, and errors.
-
Getting Information
Get information that includes, for example, language and state.
-
Connecting and Disconnecting the STT
Connect and disconnect the STT.
-
Setting Options and Controlling Recording
Set the options and control recording for the STT.
Creating and Destroying STT Handles
To create and destroy STT handles:
- To use the functions and data types of the STT (speech-to-text) API (in mobile and wearable applications), include the <stt.h> header file in your application:
#include <stt.h>
-
To use the STT library, create an STT handle. The STT handle is used for other STT functions as a parameter. After creating the handle, the STT state is changed to STT_STATE_CREATED.
Note STT is not thread-safe and depends on the ecore main loop. Therefore, you must have the ecore main loop. Do not use STT in a thread. void create_stt_handle() { stt_h stt; int ret; ret = stt_create(&stt); if (STT_ERROR_NONE != result) { // Error handling } }
-
When you no longer need the STT library, destroy the STT handle using the stt_destroy() function:
Note Do not use the stt_destroy() function within the callback function, or the stt_destroy() function fails and returns STT_ERROR_OPERATION_FAILED. void destroy_stt_handle(stt_h stt) { int ret; ret = stt_destroy(stt); // stt is the STT handle if (STT_ERROR_NONE != result) { // Error handling } }
Setting and Unsetting Callbacks
To set and unset callbacks:
- To use the functions and data types of the STT (speech-to-text) API (in mobile and wearable applications), include the <stt.h> header file in your application. The enum values for the callback function parameter are defined in the header file, as well as the parameter details.
#include <stt.h>
-
The STT API provides various callback functions used to get the information including the recognition result, state changes, language changes, and errors. Call the callback functions in the STT_STATE_CREATED state.
You can use the following callbacks:
- State changed
If you set the state changed callback for the STT, it is invoked when a state is changed by the STT.
void state_changed_cb(stt_h stt, stt_state_e previous, stt_state_e current, void* user_data) { // Your code } void set_state_changed_cb(stt_h stt) { int ret; ret = stt_set_state_changed_cb(stt, state_changed_cb, NULL); if (STT_ERROR_NONE != ret) { // Error handling } } void unset_state_changed_cb(stt_h stt) { int ret; ret = stt_unset_state_changed_cb(stt); if (STT_ERROR_NONE != ret) { // Error handling } }
- Default language changed
The default language of the STT is changed either when the system language is changed, or through the STT settings. To get a notification of a language change, set a callback.
void default_language_changed_cb(stt_h stt, const char* previous_language, const char* current_language, void* user_data) { // Your code } void set_default_language_changed_cb(stt_h stt) { int ret; ret = stt_set_default_language_changed_cb(stt, default_language_changed_cb, NULL); if (STT_ERROR_NONE != ret) { // Error handling } } void unset_default_language_changed_cb(stt_h stt) { int ret; ret = stt_unset_default_language_changed_cb(stt); if (STT_ERROR_NONE != ret) { // Error handling } }
- Recognition result
To get the STT recognition result, set the recognition result callback function.
The stt_foreach_detailed_result() function retrieves the time stamp of the current recognition result, so it can be called within the stt_recognition_result_cb() callback function.
bool result_time_cb(stt_h stt, int index, stt_result_time_event_e event, const char* text, long start_time, long end_time, void* user_data) { // Your code } void recognition_result_cb(stt_h stt, stt_result_event_e event, const char** data, int data_count, const char* msg, void* user_data) { // If you want to get time info of result int ret; ret = stt_foreach_detailed_result(stt, result_time_cb, NULL); if (STT_ERROR_NONE != ret) { // Error handling } // Your code } void set_recognition_result_cb(stt_h stt) { int ret; ret = stt_set_recognition_result_cb(stt, recognition_result_cb, NULL); if (STT_ERROR_NONE != ret) { // Error handling } } void unset_recognition_result_cb(stt_h stt) { int ret; ret = stt_unset_recognition_result_cb(stt); if (STT_ERROR_NONE != ret) { // Error handling } }
- Error
When an error occurs, the STT library sends an error message using a callback function:
void error_cb(stt_h stt, stt_error_e reason, void* user_data) { // Your code } void set_error_cb(stt_h stt) { int ret; ret = stt_set_error_cb(stt, error_cb, NULL); if (STT_ERROR_NONE != ret) { // Error handling } } void unset_error_cb(stt_h stt) { Int ret; Ret = stt_unset_error_cb(stt); If (STT_ERROR_NONE != ret) { // Error handling } }
- State changed
Getting Information
To get information of the current STT state and the languages used:
- To use the functions and data types of the STT (speech-to-text) API (in mobile and wearable applications), include the <stt.h> header file in your application:
#include <stt.h>
-
You can obtain the current STT state, the list of supported languages, and the current language:
- Get the current state.
The STT state is changed by other functions. It is also applied as a precondition for each function. Get the current state using the stt_get_state() function.
void get_state(stt_h stt) { stt_state_e current_state; int ret; ret = stt_get_state(stt, ¤t_state); if (STT_ERROR_NONE != ret) { // Error handling } }
- Obtain a list of languages supported by the STT using the stt_foreach_supported_languages() function. The stt_supported_language_cb callback is invoked for each supported language repeatedly. You can continue or stop getting the supported languages through the return value of the callback function.
bool supported_language_cb(stt_h stt, const char* language, void* user_data) { return true; // Get next supported language return false; // Stop } void get_supported_language(stt_h stt) { int ret; ret = stt_foreach_supported_languages(stt, supported_language_cb, NULL); if (STT_ERROR_NONE != ret) { // Error handling } }
- Get the default language using the stt_get_default_language() function. The recognition of the STT works for this default language if you do not set the language as a parameter of the stt_start() function. You can get a notification about the default language by changing the callback function that changes the default language.
void get_default_language(stt_h stt) { int ret; char* default_lang = NULL; ret = stt_get_default_language(stt, &default_lang); if (STT_ERROR_NONE != ret) { // Error handling } }
- Obtain a list of engines supported by the STT using the stt_foreach_supported_engines() function. When this function is called, the stt_supported_engine_cb callback is invoked repeatedly for each supported engine. You can continue or stop getting the supported engine through the return value of the callback function.
bool supported_engine_cb(stt_h stt, const char* engine_id, const char* engine_name, void* user_data) { return true; // Get next supported language return false; // Stop } void get_supported_engine(stt_h stt) { int ret; ret = stt_foreach_supported_engines(stt, supported_engine_cb, NULL); if (STT_ERROR_NONE != ret) { // Error handling } }
- Get or set the current engine, which is used for the STT recognition, using the stt_set_engine() and stt_get_engine() functions.
The supported language, silence detection, and supported recognition types depend on the STT engine.
// Get the engine void get_current_engine(stt_h stt) { int ret; char* current_engine_id = NULL; ret = stt_get_engine(stt, ¤t_engine_id); if (STT_ERROR_NONE != ret) { // Error handling } } // Set the engine void set_current_engine(stt_h stt, const char* engine_id) { int ret; ret = stt_set_engine(stt, engine_id); if (STT_ERROR_NONE != ret) { // Error handling } }
- Get the supported recognition types. Check whether the recognition type defined in the <stt.h> header file is supported.
The normal recognition type, STT_RECOGNITION_TYPE_FREE, means that the whole recognition result is sent at the end of recognition and that the STT engine supports it. To get a partial recognition result, use the STT_RECOGNITION_TYPE_FREE_PARTIAL recognition type, if it is supported by the current engine.
void check_supported_recognition_type(stt_h stt) { int ret; bool support; ret = stt_is_recognition_type_supported(stt, STT_RECOGNITION_TYPE_FREE_PARTIAL, &support); if (STT_ERROR_NONE != ret) { // Error handling } }
- Get the current state.
Connecting and Disconnecting the STT
To operate the STT:
- To use the functions and data types of the STT (speech-to-text) API (in mobile and wearable applications), include the <stt.h> header file in your application:
#include <stt.h>
-
After you create the STT handle, connect the background STT daemon:
-
The stt_prepare() function is asynchronous and the STT state changes to STT_STATE_READY:
void prepare_for_stt(stt_h stt) { int ret; ret = stt_prepare(stt); if (STT_ERROR_NONE != ret) { // Error handling } }
Note If you get the error callback after calling the stt_prepare() function, STT is not available. -
The stt_unprepare() function disconnects the STT, and the state is changed to STT_STATE_CREATED:
void unprepared_for_stt(stt_h stt) { int ret; ret = stt_unprepare(stt); if (STT_ERROR_NONE != ret) { // Error handling } }
-
Setting Options and Controlling Recording
To set the STT options and control recording:
- To use the functions and data types of the STT (speech-to-text) API (in mobile and wearable applications), include the <stt.h> header file in your application:
#include <stt.h>
-
You can set the following STT options:
- Set silence detection.
After the STT starts recognizing sound, some STT engines can detect silence when the sound input from the user ends. If silence detection is enabled, the STT library stops recognition automatically and sends the result. Otherwise, you can manually stop it using the stt_stop() function.
If you set the silence detection as automatic, STT works as a global STT setting. This option must be set in the STT_STATE_READY state.
void set_silence_detection(stt_h stt, stt_option_silence_detection_e type) { int ret; // The default type is STT_OPTION_SILENCE_DETECTION_AUTO ret = stt_set_silence_detection(stt, type); if (STT_ERROR_NONE != ret) { // Error handling } }
- Set or unset the start sound.
To play a sound before the STT recognition starts, call the set_start_sound() function in the STT_STATE_READY state.
Note The sound file path must be a full path. The sound type supports only the WAV format. void set_start_sound(stt_h stt, const char* filename) { int ret; ret = stt_set_start_sound(stt, filename); if (STT_ERROR_NONE != ret) { // Error handling } } void unset_start_sound(stt_h stt) { int ret; ret = stt_unset_start_sound(stt); if (STT_ERROR_NONE != ret) { // Error handling } }
- Set or unset the stop sound.
To play a sound when the STT stops, use the stt_set_stop_sound() function in the STT_STATE_READY state:
Note The sound file path must be a full path. The sound type supports only the WAV format. void set_stop_sound(stt_h stt, const char* filename) { int ret; ret = stt_set_stop_sound(stt, filename); if (STT_ERROR_NONE != ret) { // Error handling } } void unset_stop_sound(stt_h stt) { int ret; ret = stt_unset_stop_sound(stt); if (STT_ERROR_NONE != ret) { // Error handling } }
- Start, stop, and cancel recognition:
-
To start recording, use the stt_start() function. The connected STT daemon starts recording, and the state is changed to STT_STATE_RECORDING.
Note If the stt_start() function fails, check the error code. You can get the following error codes: - STT_ERROR_RECORDER_BUSY
- STT_ERROR_OUT_OF_NETWORK
- STT_ERROR_INVALID_STATE
- STT_ERROR_INVALID_LANGUAGE
The language and recognition type must be supported by the current STT engine. If you set NULL as the language parameter, the STT default language is selected using the stt_get_default_language() function.
void start(stt_h stt, const char* language, const char* type) { int ret; ret = stt_start(stt, language, type); // The default language is NULL if (STT_ERROR_NONE != ret) { // Error handling } }
-
When the STT recording is in process, you can retrieve the current recording volume using the stt_get_recording_volume() function. The volume value is retrieved periodically with the short-term recorded sound data as dB (decibels). The STT volume normally has a negative value, and 0 is the maximum value.
void get_volume(stt_h stt) { int ret; float current_volume; ret = stt_get_recording_volume(stt, ¤t_volume); if (STT_ERROR_NONE != ret) { // Error handling } }
-
To stop recording and get the recognition result, use the stt_stop() function. The state is changed to STT_STATE_PROCESSING. The result is sent as a recognition result callback and the state is changed back to STT_STATE_READY.
void stop(stt_h stt) { int ret; ret = stt_stop(stt); if (STT_ERROR_NONE != ret) { // Error handling } }
-
To stop recording without getting the result, use the stt_cancel() function. It changes the state to STT_STATE_READY.
void cancel(stt_h stt) { int ret; ret = stt_cancel(stt); if (VC_STATIC_ERROR_NONE != ret) { // Error handling } }
-
- Set silence detection.